Reading CloudTrail like an incident responder, not a compliance officer

Almost every AWS account has CloudTrail enabled. Almost no AWS account has CloudTrail searched the way an incident responder would search it. The gap between “logs retained” and “logs useful” is where most cloud-native breaches quietly live.

Key takeaway

Compliance asks is the data there? Detection asks what is the data saying? You can answer the first question with a policy document. The second requires queries that nobody has written yet.

What compliance queries look like

Compliance queries are existence checks. “Is CloudTrail enabled in every region?” “Is log-file validation turned on?” “Are logs delivered to a separate account?” These queries have binary answers, they satisfy auditors, and they tell you nothing about whether anyone has ever abused the account.

A responder query is a sequence check. “Did a single principal perform sts:AssumeRole more than twice in a fifteen-minute window from two different source IPs?” That query has no binary answer. It has a list of events that need to be looked at, and most of them will be benign. The one that isn’t, is the breach.

Five queries every CloudTrail deployment should have ready

1. Role-assumption chains from a single session

SELECT eventTime, userIdentity.sessionContext.sessionIssuer.arn AS source_role,
       requestParameters.roleArn AS target_role, sourceIPAddress
FROM cloudtrail_logs
WHERE eventName = 'AssumeRole'
  AND eventTime > now() - interval '24 hours'
GROUP BY userIdentity.sessionContext.sessionIssuer.userName
HAVING COUNT(DISTINCT requestParameters.roleArn) > 2

Normal workloads assume one role and use it. A chain of three or more role assumptions from the same session is worth reading, every time.

2. IAM principal activity from a new geographic region

Baseline each principal’s typical source region over 30 days. Alert on the first event from outside that set. This is the single highest-signal detection in a cloud environment and most teams don’t have it because they never built the baseline table.

3. Failed `sts:GetSessionToken` with `InvalidClientTokenId`

Attackers probe stolen access keys here first. A handful of InvalidClientTokenId errors a day is operational noise. Dozens in an hour, from a principal that normally has none, is credential stuffing against your IAM surface.

4. Console logins that bypass SSO

If your organization has federated SSO, console logins via userIdentity.type = 'IAMUser' (rather than AssumedRole via your identity provider) are suspicious by default. The legitimate exceptions are few and nameable. Everything else is worth a ping.

Common miss

Root account activity is often alerted on, but IAM-user console logins after SSO adoption usually aren’t. Attackers know this.

5. `GetCallerIdentity` from an unusual principal

sts:GetCallerIdentity is the “where am I?” call that nearly every offensive tool makes first after acquiring credentials. It’s noisy on its own, but weighted against the principal’s baseline, a spike is a reliable tell — especially when combined with any of the four queries above.

Why this list doesn’t include `DeleteTrail` or `PutBucketPolicy`

Most “top CloudTrail alerts” listicles lead with destructive API calls: turning off the trail, opening an S3 bucket, attaching an admin policy. Those are real and should be alerted on. They are also the controls a mature attacker will not trip, because they know those are the alerts that exist.

The responder-grade queries above catch the phase of the attack that happens before the destructive call — the reconnaissance, lateral movement, and credential triage. If you only have the destructive alerts, you are catching the footprint of an attacker who is already ready to leave.

What to change this week

Stand up a single SQL-queryable view of CloudTrail. Athena is sufficient. A SIEM is better. A spreadsheet of daily exports is not.
Write and schedule the five queries above. Start with high thresholds and tune down.
Baseline source region per principal for 30 days. Alert on deviation.
Walk through each alert with your on-call engineer once a week for a month. The goal is not to resolve them — it is to learn what normal looks like.

CloudTrail retention is a compliance control. Useful CloudTrail queries are the detection program. Most teams have the first and none of the second, which is why the quiet breach stays quiet.

Reading CloudTrail like an incident responder, not a compliance officer

What compliance queries look like

Five queries every CloudTrail deployment should have ready

1. Role-assumption chains from a single session

2. IAM principal activity from a new geographic region

3. Failed sts:GetSessionToken with InvalidClientTokenId

4. Console logins that bypass SSO

5. GetCallerIdentity from an unusual principal

Why this list doesn’t include DeleteTrail or PutBucketPolicy

What to change this week

Run a free audit to see if these patterns apply.

Continue reading.

Hardening GitHub Actions OIDC trust policies on AWS

Three cloud cost anti-patterns that survive every FinOps review

IMDSv2 is not a migration project

3. Failed `sts:GetSessionToken` with `InvalidClientTokenId`

5. `GetCallerIdentity` from an unusual principal

Why this list doesn’t include `DeleteTrail` or `PutBucketPolicy`