Infrastructure & Architecture
VPCs, compute, storage, networking, IaC, and multi-cloud topology — reviewed for single points of failure, hidden dependencies, and the architectural decisions that will hurt you at scale.
Audit category · 01 of 08
01 Scope
What this audit covers
+
Scope
What this audit covers
The Infrastructure & Architecture category covers how the building blocks of your cloud fit together: networking, compute, storage, and the code that describes them. It is the first CloudCheck 360° category because every other category inherits its decisions.
Networking
VPCs and subnets, route tables, NAT gateways, transit gateways, VPN and Direct Connect, DNS, load balancers, service mesh topology.
Compute
EC2 / Azure VMs / GCE, managed Kubernetes (EKS, AKS, GKE), serverless (Lambda, Functions, Cloud Run), batch and HPC workloads.
Storage
Object stores (S3, Blob, GCS), block storage, managed databases, lifecycle and tiering configuration, cross-region replication.
Multi-account / multi-cloud
Organization / tenant / folder structure, landing zones, account-vending patterns, workload placement across providers.
Infrastructure as code
Terraform, CloudFormation, Bicep, Pulumi, CDK. Coverage, drift, module quality, state management.
Resilience
Availability-zone distribution, cross-region strategy, failure-mode analysis, graceful degradation patterns.
02 Why it matters
Architecture mistakes compound
+
Why it matters
Architecture mistakes compound
Architecture mistakes rarely fail fast. A single-AZ production database works fine until the AZ has a bad day. An IaC module reused across thirty repositories propagates a subtle misconfiguration at thirty times the blast radius. Overlapping CIDR ranges do not matter until the day you need to peer two VPCs.
The cost of fixing an architectural decision scales roughly with how long it has been in place and how many downstream systems depend on it. That is why this audit exists — to surface the compounding mistakes while they are still cheap to fix.
03 Method
How we assess it
+
Method
How we assess it
Three parallel tracks over two to three weeks.
Track A
Automated inventory
Read-only roles across every account. Steampipe, Prowler, ScoutSuite, and provider-native tooling generate the ground-truth inventory and resource relationships.
Track B
Diagram from reality
We build architecture diagrams from the live environment rather than from documentation. The delta between the two is itself a finding.
Track C
Engineering review
Structured sessions with the engineering leads who own each domain. Context that cannot be inferred from metadata — why a decision was made, what changed, what is planned.
No agents installed. No changes to your environment. Read-only IAM roles with time-boxed access, rotated or revoked on engagement close.
04 Deliverables
What you get
+
Deliverables
What you get
- Living architecture diagram — one high-level and one per-domain (network, identity, data). Delivered as PDF and editable source (draw.io / Lucid / Mermaid).
- Single-points-of-failure register — every SPOF, rated by blast radius and likelihood, with a proposed remediation.
- IaC debt backlog — resources managed outside IaC, drifted resources, low-quality modules, and a prioritized path to full IaC coverage.
- Multi-account / multi-cloud topology review — whether your account structure supports the blast-radius isolation you think it does.
- Resource inventory CSV — every resource, every region, every account. Tagging compliance. Ownership gaps.
- Executive summary — one page. The three architectural decisions most worth changing, with expected outcome.
05 Patterns
Common findings
+
Patterns
Common findings
Across architecture engagements, these patterns show up most:
Single-AZ production resources masquerading as multi-AZ.
Auto Scaling Group spans three AZs but the RDS instance behind it is single-AZ, or an EFS file system is one-zone. Fixing this is usually a configuration change, not a rearchitecture.
Half the infrastructure is clickops.
Terraform covers the original provisioning. Every emergency fix since has been a console change. Drift detection is off. The Terraform repo is a fossil of the environment circa year-one.
Overlapping CIDR ranges across accounts.
Future VPC peering or Transit Gateway attachments will fail. The fix requires re-IP of one of the networks — expensive if left to discover at peering time.
NAT gateways per AZ with low traffic.
Three NAT gateways at $32/month each plus data-processing fees for a workload that could share one. Common in landing-zone templates copy-pasted without rescaling.
No tagging or naming convention.
Cost allocation is guesswork. Automated remediation cannot target specific workloads. Ownership of resources is tribal knowledge.
06 FAQ
Questions we get asked
+
FAQ
Questions we get asked
Does this replace an AWS Well-Architected Review? +
It complements it. Well-Architected is AWS-led and free but optimized for the six Well-Architected pillars. Our review is provider-neutral, includes cost and compliance mapping, and produces deliverables you can act on independently.
Do you install agents in our environment? +
No. Assessment is performed entirely via read-only IAM roles and provider-native APIs. Agent installs are available but optional, and only for specific scenarios we agree in advance.
Can you produce diagrams in our tool of choice? +
Yes. Default delivery is PDF plus draw.io source. Lucidchart, Mermaid, and Miro exports on request.
Do you need production access? +
Read-only access to production metadata. We do not read customer data or make changes. Sandboxed read-only roles are fine.
Will you do the remediation for us? +
Separately scoped. We do not bundle remediation with assessment so the findings stay honest. Our engineers can execute the roadmap as a follow-on project if you prefer.
Related
Part of the CloudCheck 360° methodology. See also Cost Optimization & FinOps, Security Posture, and CI/CD & DevSecOps.
Start with a free Cloud Health Check.
A scoped-down CloudCheck 360° of your current environment. Delivered in five business days, no commitment.