Saltar al contenido principal

Analytics Platform Design

This document is the consolidated design for the Percus analytics platform. It covers what we capture, where it lives, how it is queried and reported, the privacy posture, the cost envelope at startup scale, and the compatibility strategy for clients migrating from Individeo.

1. Executive summary

Percus needs an analytics platform that:

  • Captures viewer engagement on personalized videos (plays, completion, CTAs, errors) at session granularity, without ever storing recipient PII on Percus infrastructure.
  • Serves real-time-ish dashboards (1–5 min latency) for the Percus team and per-organization dashboards for clients.
  • Exports reports (CSV, PDF, webhook) so clients can join engagement data with their own business outcomes.
  • Stays cheap: serverless, pay-per-use, no idle cost. Target under USD 200/month at MVP scale.
  • Complies with GDPR, Ley 19.628 (Chile), and LGPD (Brazil).
  • Preserves compatibility for clients migrating from Individeo, so the migration is a single <script src> change.

The recommended architecture is a Lambda-based ingestion pipeline writing Parquet to S3, queried with Athena, with rollups for dashboards in the existing Aurora Serverless v2 cluster. The SDK uses Path B transport (host-page POSTs) with a lazily-loaded tracking module, mirroring the proven Individeo pattern.

Locked design decisions

#DecisionResolution
1Viewer identitySHA-256(org_id + ":" + recipient_code + ":" + org_salt) computed client-side; per-iframe session_id (UUID v4); optional playback_id for replays
2Hot tier for dashboardsAthena-only for v1; add Aurora rollups in Phase 1 if dashboards feel slow
3S3 table formatPlain Parquet (no Iceberg) in v1
4SDK transportPath B primary (host-page POSTs); two-stage lazy SDK; fetch with keepalive: true for unload events
5Latency targetNear-real-time (1–5 min) for v1; sub-second deferred

2. Goals and non-goals

In scope for v1

  • Capture and store player-emitted events from percus-player and percus-embed-sdk.
  • Organization-scoped data access by organization_id (single shared database — no per-organization infrastructure).
  • Internal dashboards (Percus team) and organization-scoped client dashboards in the backoffice.
  • CSV export and weekly PDF digest via SES.
  • Consent gating (GDPR / Ley 19.628 / LGPD).
  • Compatibility shim for clients embedded with Individeo's smartEmbed.js.

Out of scope for v1

  • A/B testing framework (separate concern).
  • Sub-second streaming analytics.
  • Anomaly detection / alerting on engagement metrics.
  • Business-outcome correlation (Phase 3: requires client-side webhook integration).
  • Predictive models on engagement.

3. Domain model

A new bounded context Analytics is added to the platform. Per the project's DDD-first conventions, the domain layer is framework-free.

Aggregates

  • Session (root) — an anonymous viewer session. Holds metadata (organization, project, template, distribution channel, public share, locale, device, geo) and the sequence of events emitted within it. Created on iframe initialization, closed on video.ended or session.timeout (15 min idle).
  • Rollup — pre-aggregated metric for a (organization_id, project_id, template_id, date, dimension) tuple. Independent aggregate; always rebuildable from raw events.
  • ConsentDecision — value object on Session. Gates which events get persisted.
  • ExportJob — for asynchronous CSV/PDF exports and DSAR deletion runs.

Inbound integration events (consumed from other contexts to enrich analytics)

  • From Campaign Service: ProjectCreated, ProjectArchived, TemplateActivated, DistributionChannelPublished, PublicVideoShareCreated. Stored as slim dim_* tables in the warehouse for joins.
  • From Identity Service: OrganizationCreated, OrganizationSettingsChanged (e.g. retention overrides).

Inbound integration events (from SDK / Player)

The canonical wire events (see Section 5 for the full schema):

session.started, video.played, video.paused, video.progressed,
video.completed, video.incomplete, video.errored,
cta.clicked, interaction.engaged, chapter.entered, chapter.exited,
consent.accepted, consent.declined, autoplay.failed

4. Identity and session model

PII is never stored. Identification uses a three-level hierarchy:

IDScopePurpose
viewer_hashStable across all sessions for a given recipient"Who" — same recipient = same hash forever (within a salt epoch)
session_idOne iframe load"One playback attempt" — the unit counted as a view
playback_idOne play within a sessionDistinguishes replays inside the same session (optional, only set if the Player supports in-place replay)

viewer_hash construction

Computed client-side in the SDK, before any event is emitted:

viewer_hash = SHA-256( org_id || ":" || recipient_code || ":" || org_salt )
  • recipient_code is the recipient identifier the host already passes to the Player for video personalization. It is never sent to Percus servers — only its hash.
  • org_salt is a per-organization secret shipped to the Player with the manifest. Rotated annually (hard rotation: long-term cohort analysis resets at each rotation, simpler and stronger privacy than versioned salts).
  • Same recipient embedded by two different organizations hashes to different values (org is in the input + salt is per-org). No cross-organization correlation.
  • Under GDPR / Ley 19.628 / LGPD, recipient_code is pseudonymous PII because the client's CRM can reverse it. The hash + per-org salt means a leak of the Percus analytics store is useless without the client's mapping table.

session_id

  • Generated by the Player once per iframe initialization (UUID v4).
  • All events from that iframe load carry it.
  • A new email open, new landing page visit, or full page refresh produces a new session_id.
  • One session_id equals one "view" in dashboards. Two sessions from the same viewer_hash indicates a repeat viewer (useful for retention metrics).

playback_id

  • Integer counter, incremented per replay within a single session.
  • Only emitted if the Player allows replay without re-initializing.
  • Skipped in v1 if replays require a full reload.

5. Event schema v1

Canonical envelope

Every event from the SDK shares this envelope:

event_id UUID v4, client-generated (idempotency key)
schema_version "1"
event_type enum (see below)
occurred_at ISO-8601 with ms, client clock
received_at server-side stamp
organization_id client organization (bound to the share token, never trusted from body)
project_id
template_id
template_version
distribution_channel_id
public_share_id nullable
viewer_hash SHA-256 hex string, client-computed
session_id UUID v4, per iframe load
playback_id integer, optional
locale e.g. "es-CL"
device { type, os, browser } — parsed from UA server-side, raw UA discarded
geo { country } — from CloudFront-Viewer-Country header, no IP stored
referrer_host host only (no path, no query)
consent { tracking: boolean, granted_at }
payload event-specific (see below)

Stored as Parquet with the metadata fields as top-level columns for partition pruning and columnar economy.

Wire event types

Event typePayload fieldsNotes
session.started(none)First event of every session
video.playedposition_msPlay initiated or resumed
video.pausedposition_ms
video.progressedposition_ms, percentSampled at 25/50/75% milestones, not every tick
video.completedduration_ms, percentNatural end, percent ≥ 95
video.incompleteduration_ms, percentLeft before natural end
video.errorederror_code, error_messageAlways allowed even without consent (operational telemetry, no viewer identifiers attached)
cta.clickedcta_id, cta_name, cta_value, position_msName fields truncated to 64 chars
interaction.engagedinteraction_id, interaction_name, interaction_value, position_msForms, hotspots, surveys
chapter.enteredchapter_id, chapter_name, position_ms
chapter.exitedchapter_id, position_ms
consent.accepted(none)Audit event
consent.declined(none)Audit event; emitted even with tracking off
autoplay.failedreasonBrowser blocked autoplay

Name-field truncation

All free-text name fields (cta_name, interaction_name, chapter_name, error_message) are truncated to 64 characters in the SDK before transmission. This matches Individeo's behavior and bounds Parquet column width.

Schema versioning

  • schema_version is required on every event.
  • v2 only ships when a non-additive change is needed (renaming, removal). Additive fields do not bump the version.
  • The ingest Lambda routes by schema_version to the appropriate parser; old data remains queryable.

6. AWS architecture

[Player iframe] [Host page]
└─ postMessage events ────────────────────────────► [percus-embed-sdk.js]

(lazy-load on consent) │

[percus-tracking.js]

│ fetch(POST, keepalive:true)

[CloudFront]


[API Gateway HTTP API]


[Lambda: ingest]
│ - validate envelope
│ - reject if consent.tracking=false
│ (except consent.* and video.errored)
│ - parse UA → device, strip IP
│ - geo from CloudFront-Viewer-Country
│ - bind organization_id from share token

[SQS Standard]


[Lambda: writer]
│ batch up to 1000 messages

[S3 raw] s3://percus-analytics/raw/
org=<id>/year=/month=/day=/hour=
NDJSON.gz

EventBridge hourly

[Lambda: compactor]
│ NDJSON → Parquet

[S3 curated] s3://percus-analytics/curated/
org=<id>/year=/month=/day=
Parquet, snappy-compressed

EventBridge nightly

[Lambda + Athena CTAS]

[S3 rollups] s3://percus-analytics/rollups/
+ [Aurora analytics schema] (Phase 1+)

┌──────────────────────────────────────┼──────────────────────────────┐
▼ ▼ ▼
[Backoffice dashboards] [Scheduled exports] [Webhooks → client BI]
(Athena + Aurora rollups) (SES PDF/CSV) (Phase 3)

Component rationale

API Gateway HTTP API (not REST API). $1.00 per million requests versus $3.50 per million for REST. We do not need REST-API-only features (request validators, SDK gen, WAF integration is available on HTTP API too via the v2 console). Single biggest cost saving.

Lambda → SQS → Lambda (not direct Lambda → S3). Two reasons. First, campaign launches can spike traffic 100× the normal baseline; SQS smooths the burst, gives us free retries, and decouples the writer's throughput from the ingest path's tail latency. Second, batching at the writer lets us turn 1000 events into a single S3 PUT — orders of magnitude cheaper than per-event writes.

No Kinesis at MVP. Kinesis Data Streams has a ~$11/month per-shard idle floor; Firehose adds delivery cost per GB. At MVP volumes (≤ 100K events/hour peak) both are more expensive than Lambda + SQS. Migration trigger: when the writer Lambda crosses ~10M events/day or batching latency exceeds 5 minutes, swap in Kinesis Firehose with Parquet conversion. The S3 layout stays the same.

Hourly compactor. Many small NDJSON.gz files in the raw zone become hour-partitioned Parquet in the curated zone. Athena's per-query cost is proportional to bytes scanned, and Parquet + partitioning brings typical dashboard queries down to scanning MB instead of GB.

Athena, not Redshift / QuickSight / OpenSearch. Athena has zero idle cost and scales to whatever scan volume we need at $5/TB. Redshift Serverless has a higher minimum spend and is harder to justify at startup scale. QuickSight authors are $24/user/month — not startup-friendly. The Next.js backoffice can render charts directly from Athena/Aurora query results.

Aurora analytics schema (Phase 1+). Reuses the existing Aurora Serverless v2 cluster — no new infrastructure. Only stores pre-aggregated rollups (small) and the most recent 30 days of denormalized session metadata for fast dashboard queries. Raw events stay on S3.

Storage tiers and retention

ZoneFormatPurposeRetention
s3://percus-analytics/raw/NDJSON.gzRecovery, replay30 days, then S3 lifecycle to Glacier Instant Retrieval (1 year), then delete
s3://percus-analytics/curated/Parquet snappy, partitionedAthena ad-hoc, rebuilds2 years standard, then Glacier IR (5 years)
s3://percus-analytics/rollups/Parquet daily and weeklyPre-aggregated metrics5 years
Aurora analytics.*TablesHot tier for dashboards (Phase 1+)30 days raw session metadata; rollups indefinite

Retention is configurable per organization for contractual variation. DSAR deletion: lookup by viewer_hash, rewrite affected Parquet partitions with the matching events removed. This is acceptable at v1 volumes (a partition rewrite is minutes of Lambda work).

Organization-scoped data access

Percus uses a single shared database with organization_id as a column on every row — there is no per-organization infrastructure or schema. The same convention is followed in the Identity and Campaign services. Scoping is enforced at the query layer.

  • organization_id is the first S3 partition key, so every Athena query prunes to a single organization's prefix. This is primarily a query-efficiency optimization (and a cost optimization, since Athena bills per byte scanned).
  • Aurora analytics tables enforce organization_id in every query at the repository layer, same as the existing services.
  • Ingest auth: the SDK loads under a signed share token (per public_share_id or per published distribution channel). The token carries organization_id and project_id; the ingest Lambda trusts only the values bound to the token, never the request body. This prevents an attacker from writing events into another organization's data, even though the database is shared.

7. SDK transport design

Path B: host-page POSTs

The Percus Player runs in an iframe on player.percus.cl; the host page lives on the client's domain (e.g. bank-client.com). Player events travel:

Player iframe ──postMessage──► percus-embed-sdk.js on host
└─ percus-tracking.js on host
└─ fetch POST → api.percus.cl/analytics/v1/events

The HTTP request originates from the host page's first-party context (Origin: bank-client.com), not from the iframe. This matches Individeo's pattern and is materially less likely to be blocked by ad-blockers, browser anti-tracking measures, and CSP policies that customers have already provisioned.

Two-stage lazy SDK

Two scripts:

  • percus-embed-sdk.js — the bootstrap. Small (under 30 KB minified, target). Always loaded. Wires postMessage, handles iframe lifecycle, exposes the public API. Does not contain any tracking code.
  • percus-tracking.js — the tracking module. Loaded lazily by the bootstrap only when:
    1. Tracking is enabled for this embed (data-percus-enable-tracking="true" or its Individeo alias), AND
    2. Consent has been granted (data-percus-consent-accepted="true" was set up front, OR the host called Percus.acceptConsent() post-load).

If consent is declined, percus-tracking.js never downloads. There is literally no analytics code on the page. This is a stronger privacy posture than gating event emission post-load and mirrors Individeo's loader logic exactly.

Reliability

  • Unload events: fetch with keepalive: true (modern replacement for sendBeacon). The browser handles delivery during page unload; no retry needed.
  • Normal events: queued in-memory, single-flight (_trackingInProgress flag), retried with 200ms × (n + 1) backoff (matches Individeo's well-tested defaults). Permanent failure counter is shared across the queue; after 5 consecutive failures the queue stops to avoid burning the user's network.
  • A tab close mid-retry drops queued events. Acceptable trade-off — the alternative (IndexedDB queue) adds complexity for diminishing returns at our event volumes.

First-party proxy support

Privacy-strict clients can route events through their own subdomain. The SDK respects, in priority order:

  1. Per-call: payload.trackingURL (rare).
  2. Global: window.percusTrackingURL or window.PERCUS_TRACK_BASE_URL.
  3. Attribute: data-percus-tracking-url on the embed element.
  4. Default: https://api.percus.cl/analytics/v1.

A bank that prefers fully first-party can CNAME analytics.bank-client.com → api.percus.cl and set window.percusTrackingURL = "https://analytics.bank-client.com/v1". Requests then appear fully first-party even to strict extensions and CSPs.

8. Individeo compatibility strategy

Existing Percus customers (Porvenir, Plan Vital, AFP Habitat, Contemporánea Seguros, Produbanco) have integrations against Individeo's smartEmbed.js. To make migration low-effort we ship a thin adapter, percus-individeo-compat.js, that recognizes Individeo's attribute namespace and global API and routes everything through percus-embed-sdk.js. Customer change is a single <script src> swap.

The compatibility shim is informed by direct analysis of Individeo's smartEmbed.js and smartTracking.js (Section 8.4).

8.1 Attribute mapping

Individeo attributePercus attributeBehavior
data-iv-smart-embed-proxydata-percus-embedMarker that triggers the embed
data-iv-tracking-urldata-percus-tracking-urlAnalytics endpoint override
data-iv-tracker-group-keydata-percus-organization-idOrganization key
data-iv-enable-trackingdata-percus-enable-trackingWire flag utrk
data-iv-enable-smart-trackingdata-percus-enable-trackingSame target
data-iv-enable-ga-trackingdata-percus-enable-gaGA forwarding
data-iv-embed-smart-trackingdata-percus-enable-trackingSame target
data-iv-ga-event-namedata-percus-ga-event-nameGA event name
data-iv-ga-category-namedata-percus-ga-category-nameGA category
data-iv-consent-requireddata-percus-consent-requiredConsent gate
data-iv-consent-accepteddata-percus-consent-acceptedConsent decision
data-iv-project-codedata-percus-project-idProject ID
data-iv-media-iddata-percus-template-idTemplate ID
data-iv-media-codedata-percus-template-versionTemplate version
data-iv-envdata-percus-environmentDEV/STAGING/PROD

8.2 Global aliases

The shim defines getter/setter aliases on window so existing host-page code that reads or writes Individeo globals keeps working:

window.individeoData ──► window.percusData
window.IndivideoVersion ──► window.percusVersion
window.IVDomains ──► window.percusDomains
window.SmartTracking ──► window.percusTracking
window.ivTrackerKey ──► window.percusSessionKey
window.ivtgk ──► window.percusTrackerGroupKey
window.ivTrackingURL ──► window.percusTrackingURL
window.TRACK_BASE_URL ──► window.PERCUS_TRACK_BASE_URL
window.smartEmbedModal ──► window.percusModal
window.BluePlayer ──► window.percusPlayer

8.3 Callback fan-out

Customer code may have registered handlers against any of Individeo's ~60 on* callback names. The shim re-emits Percus-native events under each legacy name, so subscribers keep firing. The minimum set to support (highest customer usage):

onCreate, onDestroy, onReady, onError
onAutoplay, onAutoplayFailure
onFirstPlay, onPlayIncomplete, onReplay
onFirstQuartileComplete, onSecondQuartileComplete,
onThirdQuartileComplete, onLastQuartileComplete
onCTA, onServiceCTA, onInteraction
onChapterEnter, onChapterExit
onConsentAccepted, onConsentDeclined
onFullscreenEnter, onFullscreenExit
onIFrameReady, onDataReady

The full set is enumerated in the Individeo bundle; the shim provides a no-op stub for any not yet implemented so customer code does not throw.

8.4 Wire-event compatibility (optional Phase 2)

Some customers run dashboards or BI pipelines that consume Individeo's tracking endpoint payload format directly. For zero-disruption migration, expose a compatibility endpoint:

POST /analytics/v1/indiTrack?evt={eventType}
Content-Type: application/json
Body: { event, trackerKey, trackerGroupKey, ctaName, ctaValue, ... }

This accepts Individeo's exact payload shape and translates internally to the Percus canonical event schema. Customers who keep using the Individeo-style payload do not need to change anything downstream of the analytics endpoint either.

Findings that informed this section (from direct analysis of cdn.individeo.com/individeo/prod/edge/js/smartEmbed.js and smartTracking.js):

  • Transport: Path B (host-page POSTs); iframe never POSTs directly.
  • Endpoints: https://track.individeo.com/api/indiTrack (prod), https://track-ci.individeo.com/api/indiTrack (CI). Per-customer override supported via tracking-url attribute, window.TRACK_BASE_URL, window.ivTrackingURL, or per-call trackingURL in the payload.
  • Wire format: POST {base}/api/indiTrack?evt={type}, body is JSON.stringify(payload), Content-Type: application/json. Unload-time variant uses fetch with keepalive: true.
  • Wire-event whitelist (only these are sent to the server, the ~60 other on* names are local callbacks): pageLoad, pageReload, onCTA, onServiceCTA, onInteraction, onChapterEnter, onEvent, onLog, onConsentAccepted, onConsentDeclined, onGATracking, onTrackerKeyDefined.
  • Payload shape: event, trackerKey, trackerGroupKey, plus event-specific name/value pairs (ctaName/ctaValue, interactionName/interactionValue, etc.), all truncated to 64 chars.
  • Reliability: in-memory queue, single-flight, retry 200ms × (n+1).
  • Consent posture: smartTracking.js is not even injected into the page unless consent is accepted. Strong privacy posture, copied wholesale.
  • GA integration: parallel window.dataLayer push when enable-ga-tracking is set.

9. Reporting and client access

Internal dashboards (Percus team)

Live in the existing backoffice as a new section. Pages:

  • Live campaigns — sessions started in the last 60 minutes, completion rate, CTA CTR, error spikes. Refreshed every 60 seconds.
  • Per-project drill-down — drop-off histogram (25 / 50 / 75 / 100% milestones), CTA funnel, top errors, top devices and locales.
  • Cross-organization overview — Percus internal only; identifies which clients are healthy versus which are seeing low engagement.

Reads from Aurora rollups for default views (sub-second); Athena for ad-hoc deep slices (1–5 seconds, surfaced as "deep query" button).

Client dashboards (per-organization)

Same backoffice, organization-filtered. Org users see only their organization_id's data. Default views per project: views, completion rate, CTA CTR, drop-off curve, top devices and locales. Includes:

  • CSV export — generates a signed S3 URL pointing to a Parquet → CSV conversion of the current view. Job runs in Lambda, expires in 24 hours.
  • Weekly PDF digest — Lambda + SES, scheduled per organization, opt-in.
  • Webhook export (Phase 3) — POST per-batch event payloads to a client-configured endpoint for ingestion into their BI.
  • Google Analytics 4 forwarding (Phase 2) — SDK mirrors a minimal set of events to GA4 with the client's measurement ID. Same for Adobe Analytics.

Why not QuickSight

QuickSight authors are $24 per user per month. At a 10-client target with 2–3 dashboard authors per client, that is $480–720 per month before any actual readers. The Next.js backoffice already exists, the team builds Next.js daily, and Recharts or Tremor cover the chart vocabulary needed. We control the UX, organization scoping, and styling. Athena and Aurora directly back the queries.

10. Privacy and compliance

The platform-level commitments (PII never stored, anonymized identifiers, no raw IP, consent-gated) are detailed in Data Handling. This section enumerates the analytics-specific implementation rules.

  • No raw IP leaves the edge. The ingest Lambda reads CloudFront-Viewer-Country and discards the source IP before any persistence. CloudFront is configured to strip the client IP from forwarded headers.
  • No raw User-Agent stored. Parsed in-memory in the ingest Lambda; only {type, os, browser} survives.
  • Consent gate enforced at script-load. percus-tracking.js is not downloaded until consent is accepted, except for video.errored operational telemetry (which omits all viewer identifiers).
  • Anonymous identifiers only. viewer_hash is computed client-side with an org-specific salt. Same person across two organizations cannot be correlated.
  • DSAR support. Right-to-be-forgotten: per-org delete job indexed by viewer_hash. Documented SLA: 30 days.
  • Audit trail. Every export request and every retention/deletion job is logged to CloudTrail and the platform audit log table.
  • Encryption. S3 SSE-KMS; Aurora at-rest encryption; TLS 1.3 in transit; secrets in AWS Secrets Manager. Already the platform standard.
  • Salt rotation. Annual rotation, documented in the privacy policy. Cohort analysis across rotation boundaries resets — accepted trade-off for simplicity and stronger privacy versus a versioned-salt scheme.

11. Cost estimate

Assumes Lambda + SQS + S3 + Athena + Aurora hot tier (existing cluster, no incremental cost):

ScaleDaily eventsMonthly cost (analytics layer only)Driver notes
Early100 KUSD 15 – 25Mostly API Gateway + S3 + CloudWatch
Phase 2 target2.7 MUSD 120 – 180API Gateway dominates; Athena scans cheap due to partitioning
Phase 3 multi-client25 MUSD 600 – 900At this point migrate ingest to Firehose + Parquet conversion for ~30% savings

Major cost drivers in priority order: API Gateway requests (linear), CloudWatch logs (capped via log level discipline), S3 PUT operations (bounded by writer batch size of 1000), Athena scans (bounded by Parquet partitioning).

Aurora Serverless v2 minimum capacity is already paid for by the platform; analytics adds only a small schema and rollup tables — incremental load is negligible.

12. Phasing

Phase 0 — Foundation (2–3 weeks). Event schema v1, SDK event bus and buffering, consent gate, two-stage lazy load, ingest Lambda + SQS + S3 raw zone, hourly compactor to Parquet, Athena workgroup and Glue table. Outcome: events flowing, queryable via Athena. TDD throughout.

Phase 1 — Internal dashboards (2–3 weeks). Nightly rollups, Aurora analytics schema, backoffice "Project analytics" page with completion rate, CTR, drop-off histogram. Smoke tests against production-scale fixture data.

Phase 2 — Client-facing (3–4 weeks). Organization-scoped dashboards in the backoffice, CSV export, weekly PDF digest via SES, Individeo compatibility shim v1, DSAR deletion job, GA4 forwarding.

Phase 3 — Integrations (ongoing). Webhooks for client BI, Adobe Analytics forwarding, business-outcome correlation API (client pushes "viewer converted" events; we join to sessions for ROI), Individeo wire-event compatibility endpoint.

13. Open questions for the AWS architect

  1. WAF for the ingest endpoint. API Gateway HTTP API supports AWS WAF. Worth the cost at MVP? The analytics endpoint is public by design (browser-fetched), but rate limiting per viewer_hash or per IP would mitigate abuse. Alternative: API Gateway throttling alone.
  2. CloudFront in front of API Gateway. Adds another layer ($0.0085 per 10K requests + transfer) but lets us cache the OPTIONS preflight and centralize edge processing. Worth it at our request volumes?
  3. VPC for the ingest Lambda. The ingest Lambda touches SQS only — no DB access. Keeping it out of a VPC avoids cold-start penalties. Confirm there is no compliance reason to require VPC isolation.
  4. Glue Data Catalog or self-managed table definitions? Glue is convenient but has a small per-table cost. At one Parquet table for raw events, the difference is negligible — recommend Glue. Sanity check.
  5. Athena workgroup quotas and cost alarms. What is the right default per-query data-scanned limit to set, to prevent a runaway dashboard query from scanning a year of data?
  6. S3 read enforcement. Since the database is shared and scoping is enforced at the query layer (repository checks organization_id), is the S3 partition prefix sufficient for Athena query scoping, or do we want bucket policies + IAM conditions as a second layer?
  7. Cross-region DR. Active-active is overkill at MVP. Is S3 Cross-Region Replication on the curated and rollups zones sufficient for RTO/RPO targets, or do we need standby Lambdas?
  8. Reserved concurrency. Should the ingest Lambda have reserved concurrency to protect downstream services during a traffic spike, or is unreserved fine given SQS smoothing?
  9. Kinesis Firehose migration triggers. At what concrete event-per-day threshold does Firehose become cheaper than Lambda + SQS? Help validate the ~10M/day rule of thumb.
  10. Athena Iceberg in v2. Worth planning for, even though v1 is plain Parquet? Iceberg would simplify DSAR partition rewrites and schema evolution.