Infrastructure & Architecture

VPCs, compute, storage, networking, IaC, and multi-cloud topology — reviewed for single points of failure, hidden dependencies, and the architectural decisions that will hurt you at scale.

Audit category · 01 of 08

Scope

What this audit covers

The Infrastructure & Architecture category covers how the building blocks of your cloud fit together: networking, compute, storage, and the code that describes them. It is the first CloudCheck 360° category because every other category inherits its decisions.

Networking

VPCs and subnets, route tables, NAT gateways, transit gateways, VPN and Direct Connect, DNS, load balancers, service mesh topology.

Compute

EC2 / Azure VMs / GCE, managed Kubernetes (EKS, AKS, GKE), serverless (Lambda, Functions, Cloud Run), batch and HPC workloads.

Storage

Object stores (S3, Blob, GCS), block storage, managed databases, lifecycle and tiering configuration, cross-region replication.

Multi-account / multi-cloud

Organization / tenant / folder structure, landing zones, account-vending patterns, workload placement across providers.

Infrastructure as code

Terraform, CloudFormation, Bicep, Pulumi, CDK. Coverage, drift, module quality, state management.

Resilience

Availability-zone distribution, cross-region strategy, failure-mode analysis, graceful degradation patterns.

Why it matters

Architecture mistakes compound

Architecture mistakes rarely fail fast. A single-AZ production database works fine until the AZ has a bad day. An IaC module reused across thirty repositories propagates a subtle misconfiguration at thirty times the blast radius. Overlapping CIDR ranges do not matter until the day you need to peer two VPCs.

The cost of fixing an architectural decision scales roughly with how long it has been in place and how many downstream systems depend on it. That is why this audit exists — to surface the compounding mistakes while they are still cheap to fix.

Method

How we assess it

Three parallel tracks over two to three weeks.

Track A

Automated inventory

Read-only roles across every account. Steampipe, Prowler, ScoutSuite, and provider-native tooling generate the ground-truth inventory and resource relationships.

Track B

Diagram from reality

We build architecture diagrams from the live environment rather than from documentation. The delta between the two is itself a finding.

Track C

Engineering review

Structured sessions with the engineering leads who own each domain. Context that cannot be inferred from metadata — why a decision was made, what changed, what is planned.

No agents installed. No changes to your environment. Read-only IAM roles with time-boxed access, rotated or revoked on engagement close.

Deliverables

What you get

Living architecture diagram — one high-level and one per-domain (network, identity, data). Delivered as PDF and editable source (draw.io / Lucid / Mermaid).
Single-points-of-failure register — every SPOF, rated by blast radius and likelihood, with a proposed remediation.
IaC debt backlog — resources managed outside IaC, drifted resources, low-quality modules, and a prioritized path to full IaC coverage.
Multi-account / multi-cloud topology review — whether your account structure supports the blast-radius isolation you think it does.
Resource inventory CSV — every resource, every region, every account. Tagging compliance. Ownership gaps.
Executive summary — one page. The three architectural decisions most worth changing, with expected outcome.

Patterns

Common findings

Across architecture engagements, these patterns show up most:

Single-AZ production resources masquerading as multi-AZ.

Auto Scaling Group spans three AZs but the RDS instance behind it is single-AZ, or an EFS file system is one-zone. Fixing this is usually a configuration change, not a rearchitecture.

Half the infrastructure is clickops.

Terraform covers the original provisioning. Every emergency fix since has been a console change. Drift detection is off. The Terraform repo is a fossil of the environment circa year-one.

Overlapping CIDR ranges across accounts.

Future VPC peering or Transit Gateway attachments will fail. The fix requires re-IP of one of the networks — expensive if left to discover at peering time.

NAT gateways per AZ with low traffic.

Three NAT gateways at $32/month each plus data-processing fees for a workload that could share one. Common in landing-zone templates copy-pasted without rescaling.

No tagging or naming convention.

Cost allocation is guesswork. Automated remediation cannot target specific workloads. Ownership of resources is tribal knowledge.

FAQ

Questions we get asked

Does this replace an AWS Well-Architected Review? +

It complements it. Well-Architected is AWS-led and free but optimized for the six Well-Architected pillars. Our review is provider-neutral, includes cost and compliance mapping, and produces deliverables you can act on independently.

Do you install agents in our environment? +

No. Assessment is performed entirely via read-only IAM roles and provider-native APIs. Agent installs are available but optional, and only for specific scenarios we agree in advance.

Can you produce diagrams in our tool of choice? +

Yes. Default delivery is PDF plus draw.io source. Lucidchart, Mermaid, and Miro exports on request.

Do you need production access? +

Read-only access to production metadata. We do not read customer data or make changes. Sandboxed read-only roles are fine.

Will you do the remediation for us? +

Separately scoped. We do not bundle remediation with assessment so the findings stay honest. Our engineers can execute the roadmap as a follow-on project if you prefer.

Part of the CloudCheck 360° methodology. See also Cost Optimization & FinOps, Security Posture, and CI/CD & DevSecOps.

Start with a free Cloud Health Check.

A scoped-down CloudCheck 360° of your current environment. Delivered in five business days, no commitment.

Book a Health Check See engagement packages