Data Pipeline All Users

Understanding how data moves through Nucleus helps you interpret freshness, gaps, and timing.

How Data Flows

Ad Server → S3 Storage → Ingest Worker → PostgreSQL → Aggregation → Edge Cache → Dashboard

Ad Server — Your ad server generates Log Level Data (LLD) for every ad event (request, impression, revenue).
S3 Storage — LLD files are deposited into an S3 bucket, typically in hourly batches.
Ingest Worker — A background worker picks up new files, validates them, and inserts the records into PostgreSQL.
PostgreSQL — Raw events are stored in monthly partitioned tables. This is the source of truth.
Aggregation — Pre-computed daily and hourly aggregates are calculated for fast dashboard queries.
Edge Cache — Aggregated metrics are cached at the edge for instant dashboard loads.
Dashboard — You see the final result in Nucleus.

Streaming Platform Pipeline

Streaming campaign data follows a separate path:

CSV Files → S3 → CSV Direct Loader → PostgreSQL → Aggregation → Dashboard

Streaming platform files are processed through a dedicated loader optimized for their CSV format. The data lands in the same PostgreSQL database and is available through the same dashboards.

Data Freshness

Data Type	Typical Delay
Ad server LLD	~1 hour
Streaming campaign data	~1 hour
Dashboard aggregates	Updated after ingest completes
Edge cache	Refreshed on each dashboard load

If data appears to be missing, the most common cause is a normal pipeline delay. Data from the current hour typically appears within the next hour. See Missing Data for troubleshooting steps.

What This Means for You

Real-time? No. Nucleus is near-real-time with approximately 1-hour delay.
Gaps? If the ad server has a delivery issue, the gap appears in Nucleus after the normal delay.
Backfills? When historical data is reprocessed, it flows through the same pipeline and updates existing records.

Using Chat Understanding KPIs