Athena Query Examples
Ad-hoc and dashboard queries run against the curated zone through the Glue table
analytics.events and the Athena workgroup percus-analytics (which
enforces a 10 GB per-query scan cap — queries over the cap are rejected, not
silently truncated).
The analytics.events table
The compactor writes one snappy Parquet file per organization-hour to the curated bucket, laid out as Hive-style partitions:
s3://percus-analytics-curated/org={id}/year={YYYY}/month={MM}/day={DD}/hour={HH}/part-0000.parquet
Partition columns: org, year, month, day, hour. Partitions are
resolved with Athena partition projection — there is no crawler and no
MSCK REPAIR TABLE to run.
orgorg uses the injected projection type, so Athena requires an equality
filter (WHERE org = '...') on every query. A query without it fails to plan.
This is intentional: it guarantees each query is scoped to a single
organization and never scans the whole catalog.
Always add year/month/day predicates too — they prune partitions so a
typical dashboard query (one project, 7 days) scans well under 100 MB. The
org value is the organization UUID and matches the organization_id column.
The event payload (which varies per event_type) is stored as a JSON string in
payload_json; extract fields with json_extract_scalar.
Completion rate by project and date range
Share of sessions that reached video.completed out of sessions that started
playback, for one project over a 7-day window.
SELECT
project_id,
COUNT(DISTINCT CASE WHEN event_type = 'video.completed' THEN session_id END) * 1.0
/ NULLIF(COUNT(DISTINCT CASE WHEN event_type = 'video.played' THEN session_id END), 0)
AS completion_rate
FROM analytics.events
WHERE org = '11111111-1111-1111-1111-111111111111'
AND project_id = '22222222-2222-2222-2222-222222222222'
AND year = 2026 AND month = 5 AND day BETWEEN 20 AND 26
GROUP BY project_id;
For a range that spans month or year boundaries, compose the partition
predicates (e.g. (year = 2026 AND month = 5 AND day >= 28) OR (year = 2026 AND month = 6 AND day <= 4))
or filter on occurred_at for an exact timestamp window in addition to the
partition predicates that do the pruning.
Drop-off histogram (25 / 50 / 75 / 100 %)
How many distinct sessions reached each quartile. The 25/50/75 marks come from
video.progressed (with payload.percent); 100 % is video.completed.
SELECT quartile, COUNT(DISTINCT session_id) AS sessions
FROM (
SELECT
session_id,
CAST(json_extract_scalar(payload_json, '$.percent') AS INTEGER) AS quartile
FROM analytics.events
WHERE org = '11111111-1111-1111-1111-111111111111'
AND project_id = '22222222-2222-2222-2222-222222222222'
AND year = 2026 AND month = 5 AND day BETWEEN 20 AND 26
AND event_type = 'video.progressed'
UNION ALL
SELECT session_id, 100 AS quartile
FROM analytics.events
WHERE org = '11111111-1111-1111-1111-111111111111'
AND project_id = '22222222-2222-2222-2222-222222222222'
AND year = 2026 AND month = 5 AND day BETWEEN 20 AND 26
AND event_type = 'video.completed'
)
GROUP BY quartile
ORDER BY quartile;
CTA funnel
CTA clicks per call-to-action, alongside the playback base, to read conversion.
SELECT
json_extract_scalar(payload_json, '$.cta_id') AS cta_id,
json_extract_scalar(payload_json, '$.cta_name') AS cta_name,
COUNT(*) AS clicks,
COUNT(DISTINCT session_id) AS sessions_with_click
FROM analytics.events
WHERE org = '11111111-1111-1111-1111-111111111111'
AND project_id = '22222222-2222-2222-2222-222222222222'
AND year = 2026 AND month = 5 AND day BETWEEN 20 AND 26
AND event_type = 'cta.clicked'
GROUP BY 1, 2
ORDER BY clicks DESC;
To express it as a funnel, compare against the playback base for the same window:
SELECT
COUNT(DISTINCT CASE WHEN event_type = 'video.played' THEN session_id END) AS sessions_played,
COUNT(DISTINCT CASE WHEN event_type = 'cta.clicked' THEN session_id END) AS sessions_clicked
FROM analytics.events
WHERE org = '11111111-1111-1111-1111-111111111111'
AND project_id = '22222222-2222-2222-2222-222222222222'
AND year = 2026 AND month = 5 AND day BETWEEN 20 AND 26;
Top devices and locales
Most common device/OS/browser and locale/country combinations by session count.
SELECT
device_type, device_os, device_browser, locale, geo_country,
COUNT(DISTINCT session_id) AS sessions
FROM analytics.events
WHERE org = '11111111-1111-1111-1111-111111111111'
AND year = 2026 AND month = 5 AND day BETWEEN 20 AND 26
GROUP BY 1, 2, 3, 4, 5
ORDER BY sessions DESC
LIMIT 50;
Cost & performance notes
- Run everything in the
percus-analyticsworkgroup so the 10 GB per-query scan cap and the shared results location apply. A scan-budget alarm fires when a query exceeds ~8 GB (80 % of the cap) so heavy queries get reviewed before they hit the hard limit. - Select only the columns you need — Parquet is columnar, so narrow projections scan dramatically fewer bytes.
- Keep the
org+year/month/daypredicates on every query; they are what keep a typical dashboard query under 100 MB and under ~5 seconds.