Skip to main content

Athena Query Examples

Ad-hoc and dashboard queries run against the curated zone through the Glue table analytics.events and the Athena workgroup percus-analytics (which enforces a 10 GB per-query scan cap — queries over the cap are rejected, not silently truncated).

The analytics.events table

The compactor writes one snappy Parquet file per organization-hour to the curated bucket, laid out as Hive-style partitions:

s3://percus-analytics-curated/org={id}/year={YYYY}/month={MM}/day={DD}/hour={HH}/part-0000.parquet

Partition columns: org, year, month, day, hour. Partitions are resolved with Athena partition projection — there is no crawler and no MSCK REPAIR TABLE to run.

Every query must filter on org

org uses the injected projection type, so Athena requires an equality filter (WHERE org = '...') on every query. A query without it fails to plan. This is intentional: it guarantees each query is scoped to a single organization and never scans the whole catalog.

Always add year/month/day predicates too — they prune partitions so a typical dashboard query (one project, 7 days) scans well under 100 MB. The org value is the organization UUID and matches the organization_id column.

The event payload (which varies per event_type) is stored as a JSON string in payload_json; extract fields with json_extract_scalar.

Completion rate by project and date range

Share of sessions that reached video.completed out of sessions that started playback, for one project over a 7-day window.

SELECT
project_id,
COUNT(DISTINCT CASE WHEN event_type = 'video.completed' THEN session_id END) * 1.0
/ NULLIF(COUNT(DISTINCT CASE WHEN event_type = 'video.played' THEN session_id END), 0)
AS completion_rate
FROM analytics.events
WHERE org = '11111111-1111-1111-1111-111111111111'
AND project_id = '22222222-2222-2222-2222-222222222222'
AND year = 2026 AND month = 5 AND day BETWEEN 20 AND 26
GROUP BY project_id;

For a range that spans month or year boundaries, compose the partition predicates (e.g. (year = 2026 AND month = 5 AND day >= 28) OR (year = 2026 AND month = 6 AND day <= 4)) or filter on occurred_at for an exact timestamp window in addition to the partition predicates that do the pruning.

Drop-off histogram (25 / 50 / 75 / 100 %)

How many distinct sessions reached each quartile. The 25/50/75 marks come from video.progressed (with payload.percent); 100 % is video.completed.

SELECT quartile, COUNT(DISTINCT session_id) AS sessions
FROM (
SELECT
session_id,
CAST(json_extract_scalar(payload_json, '$.percent') AS INTEGER) AS quartile
FROM analytics.events
WHERE org = '11111111-1111-1111-1111-111111111111'
AND project_id = '22222222-2222-2222-2222-222222222222'
AND year = 2026 AND month = 5 AND day BETWEEN 20 AND 26
AND event_type = 'video.progressed'

UNION ALL

SELECT session_id, 100 AS quartile
FROM analytics.events
WHERE org = '11111111-1111-1111-1111-111111111111'
AND project_id = '22222222-2222-2222-2222-222222222222'
AND year = 2026 AND month = 5 AND day BETWEEN 20 AND 26
AND event_type = 'video.completed'
)
GROUP BY quartile
ORDER BY quartile;

CTA funnel

CTA clicks per call-to-action, alongside the playback base, to read conversion.

SELECT
json_extract_scalar(payload_json, '$.cta_id') AS cta_id,
json_extract_scalar(payload_json, '$.cta_name') AS cta_name,
COUNT(*) AS clicks,
COUNT(DISTINCT session_id) AS sessions_with_click
FROM analytics.events
WHERE org = '11111111-1111-1111-1111-111111111111'
AND project_id = '22222222-2222-2222-2222-222222222222'
AND year = 2026 AND month = 5 AND day BETWEEN 20 AND 26
AND event_type = 'cta.clicked'
GROUP BY 1, 2
ORDER BY clicks DESC;

To express it as a funnel, compare against the playback base for the same window:

SELECT
COUNT(DISTINCT CASE WHEN event_type = 'video.played' THEN session_id END) AS sessions_played,
COUNT(DISTINCT CASE WHEN event_type = 'cta.clicked' THEN session_id END) AS sessions_clicked
FROM analytics.events
WHERE org = '11111111-1111-1111-1111-111111111111'
AND project_id = '22222222-2222-2222-2222-222222222222'
AND year = 2026 AND month = 5 AND day BETWEEN 20 AND 26;

Top devices and locales

Most common device/OS/browser and locale/country combinations by session count.

SELECT
device_type, device_os, device_browser, locale, geo_country,
COUNT(DISTINCT session_id) AS sessions
FROM analytics.events
WHERE org = '11111111-1111-1111-1111-111111111111'
AND year = 2026 AND month = 5 AND day BETWEEN 20 AND 26
GROUP BY 1, 2, 3, 4, 5
ORDER BY sessions DESC
LIMIT 50;

Cost & performance notes

  • Run everything in the percus-analytics workgroup so the 10 GB per-query scan cap and the shared results location apply. A scan-budget alarm fires when a query exceeds ~8 GB (80 % of the cap) so heavy queries get reviewed before they hit the hard limit.
  • Select only the columns you need — Parquet is columnar, so narrow projections scan dramatically fewer bytes.
  • Keep the org + year/month/day predicates on every query; they are what keep a typical dashboard query under 100 MB and under ~5 seconds.