Understanding Your DataData Pipeline Overview

Data Pipeline All Users

Understanding how data moves through Nucleus helps you interpret freshness, gaps, and timing.

How Data Flows

Ad Server → S3 Storage → Ingest Worker → PostgreSQL → Aggregation → Edge Cache → Dashboard
  1. Ad Server — Your ad server generates Log Level Data (LLD) for every ad event (request, impression, revenue).
  2. S3 Storage — LLD files are deposited into an S3 bucket, typically in hourly batches.
  3. Ingest Worker — A background worker picks up new files, validates them, and inserts the records into PostgreSQL.
  4. PostgreSQL — Raw events are stored in monthly partitioned tables. This is the source of truth.
  5. Aggregation — Pre-computed daily and hourly aggregates are calculated for fast dashboard queries.
  6. Edge Cache — Aggregated metrics are cached at the edge for instant dashboard loads.
  7. Dashboard — You see the final result in Nucleus.

Streaming Platform Pipeline

Streaming campaign data follows a separate path:

CSV Files → S3 → CSV Direct Loader → PostgreSQL → Aggregation → Dashboard

Streaming platform files are processed through a dedicated loader optimized for their CSV format. The data lands in the same PostgreSQL database and is available through the same dashboards.

Data Freshness

Data TypeTypical Delay
Ad server LLD~1 hour
Streaming campaign data~1 hour
Dashboard aggregatesUpdated after ingest completes
Edge cacheRefreshed on each dashboard load

If data appears to be missing, the most common cause is a normal pipeline delay. Data from the current hour typically appears within the next hour. See Missing Data for troubleshooting steps.

What This Means for You

  • Real-time? No. Nucleus is near-real-time with approximately 1-hour delay.
  • Gaps? If the ad server has a delivery issue, the gap appears in Nucleus after the normal delay.
  • Backfills? When historical data is reprocessed, it flows through the same pipeline and updates existing records.