Data Pipeline All Users
Understanding how data moves through Nucleus helps you interpret freshness, gaps, and timing.
How Data Flows
Ad Server → S3 Storage → Ingest Worker → PostgreSQL → Aggregation → Edge Cache → Dashboard- Ad Server — Your ad server generates Log Level Data (LLD) for every ad event (request, impression, revenue).
- S3 Storage — LLD files are deposited into an S3 bucket, typically in hourly batches.
- Ingest Worker — A background worker picks up new files, validates them, and inserts the records into PostgreSQL.
- PostgreSQL — Raw events are stored in monthly partitioned tables. This is the source of truth.
- Aggregation — Pre-computed daily and hourly aggregates are calculated for fast dashboard queries.
- Edge Cache — Aggregated metrics are cached at the edge for instant dashboard loads.
- Dashboard — You see the final result in Nucleus.
Streaming Platform Pipeline
Streaming campaign data follows a separate path:
CSV Files → S3 → CSV Direct Loader → PostgreSQL → Aggregation → DashboardStreaming platform files are processed through a dedicated loader optimized for their CSV format. The data lands in the same PostgreSQL database and is available through the same dashboards.
Data Freshness
| Data Type | Typical Delay |
|---|---|
| Ad server LLD | ~1 hour |
| Streaming campaign data | ~1 hour |
| Dashboard aggregates | Updated after ingest completes |
| Edge cache | Refreshed on each dashboard load |
If data appears to be missing, the most common cause is a normal pipeline delay. Data from the current hour typically appears within the next hour. See Missing Data for troubleshooting steps.
What This Means for You
- Real-time? No. Nucleus is near-real-time with approximately 1-hour delay.
- Gaps? If the ad server has a delivery issue, the gap appears in Nucleus after the normal delay.
- Backfills? When historical data is reprocessed, it flows through the same pipeline and updates existing records.