Ingestion Control Guide

This guide explains how to manage data ingestion, including pausing/resuming data collection, adjusting rate limiting, and monitoring ingestion status.


Overview

The platform automatically collects data from multiple external sources. Admins can control this process to:

  • Pause ingestion during maintenance
  • Resume ingestion after troubleshooting
  • Adjust rate limiting to prevent API throttling
  • Monitor ingestion health and status
  • Enable/disable individual feed types

Data Sources & Schedules

SourceDefault ScheduleCron OverrideDescription
FCC Political FilesEvery 6 hours (at :45)FCC_CRONBroadcast ad disclosures (HTML scraping)
AdImpactHourly (at :15)ADIMPACT_CRONIndustry ad intelligence (dual-feed)
FEC 24-Hour ReportsEvery 4 hoursFEC_24H_CRONIndependent Expenditure filings
Meta Ads LibraryEvery 8 hoursMETA_ADS_CRONFacebook/Instagram political ads
Google Political Ads3x daily (6 AM, 2 PM, 10 PM)GOOGLE_POLITICAL_ADS_CRONGoogle/YouTube ads (BigQuery)
State Press RSSEvery 6 hoursSTATE_PRESS_RSS_CRONState-level news mentions
Email NewslettersContinuous (60s polling)INGESTION_GMAIL_POLL_INTERVALGmail API polling
Email (Rep Sheets)Continuous (60s polling)INGESTION_GMAIL_POLL_INTERVALGmail API polling for rep data
Feed Retry QueueEvery 10 minutesFEED_RETRY_PROCESS_CRONRetries failed feed runs

Pause/Resume Ingestion

Using the UI

  1. Navigate to the ingestion page (/ingestion)
  2. Look for the "Pause Ingestion" button in the top right
  3. Click to pause ingestion immediately
  4. The button changes to "Resume Ingestion" - click to resume
  5. The button is color-coded:
    • Amber when ingestion is active (shows "Pause Ingestion")
    • Green when ingestion is paused (shows "Resume Ingestion")

Using the API

Get ingestion status:

GET /api/ingestion/control

Response:

{
  "paused": false,
  "delayMs": 60000
}

Pause ingestion:

POST /api/ingestion/control
Body: {"action": "pause"}

Resume ingestion:

POST /api/ingestion/control
Body: {"action": "resume"}

Individual Feed Control

Enabling/Disabling Feeds

Each feed type can be individually enabled or disabled via environment variables:

FeedEnabled VariableDefault
Email QueueEMAIL_QUEUE_ENABLEDtrue
FCC Coverage MonitoringFCC_COVERAGE_MONITORING_ENABLEDtrue
Station RefreshSTATION_REFRESH_ENABLEDtrue
Social MonitorSOCIAL_MONITOR_ENABLEDtrue
Social Creative DiscoverySOCIAL_CREATIVE_DISCOVERY_ENABLEDtrue
Creative FollowupCREATIVE_FOLLOWUP_ENABLEDtrue
Radio Creative DiscoveryRADIO_CREATIVE_DISCOVERY_ENABLEDtrue
EOD ReportsEOD_REPORT_ENABLEDtrue
Discovery RefreshDISCOVERY_REFRESH_ENABLEDtrue
Dedupe Auto-ResolverDEDUPE_AUTO_RESOLVER_ENABLEDfalse
Pre-Buy AlertsPRE_BUY_ALERTS_ENABLEDtrue
AdImpact Creative DownloadADIMPACT_CREATIVE_DOWNLOAD_ENABLEDtrue
AdImpact TranscriptionADIMPACT_CREATIVE_TRANSCRIBE_ENABLEDtrue
AdImpact Gap DetectionADIMPACT_GAP_DETECTION_ENABLEDtrue
AdImpact Full ReconciliationADIMPACT_FULL_RECONCILIATION_ENABLEDtrue

Triggering Manual Runs

From the ingestion feeds page (/ingestion/feeds):

  1. Find the feed you want to run
  2. Click the "Run Now" button
  3. The feed will execute immediately regardless of its cron schedule
  4. View results in the run history

Rate Limiting

Rate limiting controls the delay between processing each email during newsletter ingestion.

Default Settings

  • Default delay: 1000ms (1 second between emails)
  • Valid range: 100ms to 300000ms (5 minutes)

Adjusting Rate Limiting

Via API:

POST /api/ingestion/control
Body: {"delayMs": 10000}

Via Environment Variable:

NEWSLETTER_RATE_LIMIT_MS=1000

Feed Concurrency Controls

VariableDefaultPurpose
INGESTION_CAMPAIGN_CONCURRENCY2Max campaigns processed simultaneously
INGESTION_FEED_CONCURRENCY3Max feeds per campaign simultaneously
INGESTION_FETCH_CONCURRENCY5Max concurrent HTTP fetches
FCC_STATION_CONCURRENCY3FCC station processing concurrency
FCC_PDF_CONCURRENCY3FCC PDF download concurrency
INGESTION_GMAIL_CONCURRENCY3Gmail API concurrency

Safety Limits

VariableDefaultPurpose
INGESTION_FEED_RUN_TIMEOUT_MS1800000 (30 min)Max time per feed run
INGESTION_FEED_LOCK_TTL_SECONDS2700 (45 min)Distributed lock TTL
INGESTION_FETCH_MAX_BYTES10485760 (10 MB)Max download size
FCC_MAX_PAGINATION_PAGES50Max FCC pages
FCC_MAX_RECORDS_PER_RUN50000Max FCC records
INGESTION_MAX_CSV_RECORDS100000Max CSV rows

Monitoring Ingestion Status

Ingestion Feeds Page (/ingestion/feeds)

View all feed types with:

  • Status indicators (running, success, error, locked)
  • Last run time and duration
  • Registration status and scheduled cadence
  • Error details for failed runs
  • Lock status (prevents duplicate runs)

Check Recent Runs

  1. Navigate to Campaign → Admin → Ingestion
  2. View "Recent Runs" table:
    • Green checkmark: Successful run
    • Red X: Failed run
    • Clock icon: Currently running
  3. Click on a run to see details:
    • Records found
    • Records added vs duplicates
    • Errors encountered

Health Monitoring Workers

These background jobs automatically monitor ingestion health:

MonitorScheduleWhat it Checks
Email Ingestion WatchdogEvery 2 minGmail polling is running
Queue Health CheckEvery 15 minJob queue health
Stale Watermark MonitorEvery 30 minFeeds not running on schedule
DLQ MonitorEvery 15 minDead letter queue growth
MySQL Connection MonitorEvery minuteDB connection pool
Parser Success RateDaily at 8:15 AMEmail parser success rate

When to Use Pause/Resume

Pause ingestion when:

  • Troubleshooting data issues
  • Performing database maintenance
  • Experiencing rate limiting errors from external APIs
  • Need to reduce system load during peak traffic
  • Running manual data corrections

Resume ingestion when:

  • Troubleshooting is complete
  • Maintenance is finished
  • Rate limits have reset
  • Ready to process new data

Troubleshooting

Issue: Ingestion is paused but I didn't pause it

  • Solution: Check if another admin paused it via API or UI

Issue: Feed shows "Locked" but isn't running

  • Solution: Feed locks expire after 45 minutes. If a feed crashed mid-run, the lock will auto-expire. You can also check Redis for stale locks.

Issue: Getting rate limited by external API

  • Solution: Reduce concurrency settings or increase the feed's cron interval

Issue: Gmail polling not starting

  • Solution: Check that REP_LABEL_ID and NEWSLETTER_LABEL_ID are set to valid Gmail Label IDs. Use npx tsx src/workers/mailbox/label-utils.ts to list available labels.

Issue: Pause state resets unexpectedly

  • Note: The pause state is in-memory and resets on server restart. This is by design to prevent ingestion from staying paused indefinitely.

Issue: Feed retry queue backing up

  • Solution: Check the dead letter queue (/admin/dlq) for persistent failures. Resolve root causes before retrying.

Environment Variables Summary

Core Ingestion

VariableDefaultDescription
INGESTION_CAMPAIGN_CONCURRENCY2Campaigns processed in parallel
INGESTION_FEED_CONCURRENCY3Feeds per campaign in parallel
INGESTION_DEFAULT_CRON0 * * * *Default feed schedule
INGESTION_FEED_RUN_TIMEOUT_MS1800000Feed run timeout (30 min)
INGESTION_FEED_LOCK_TTL_SECONDS2700Lock TTL (45 min)
SCHEDULER_LOCK_ENABLEDtrueEnable distributed locks

Gmail Polling

VariableDefaultDescription
INGESTION_GMAIL_POLL_INTERVAL60sPoll frequency
INGESTION_GMAIL_MAX_BATCH10Max emails per poll
INGESTION_GMAIL_CONCURRENCY3Concurrent API calls
NEWSLETTER_GMAIL_POLL_INTERVAL300000Newsletter poll (5 min)
NEWSLETTER_GMAIL_MAX_BATCH10Newsletter batch size
NEWSLETTER_RATE_LIMIT_MS1000Delay between emails

Feed Retry

VariableDefaultDescription
FEED_RETRY_ENABLEDtrueEnable retry queue
FEED_RETRY_PROCESS_CRON*/10 * * * *Retry check schedule
FEED_RETRY_MAX_RETRIES3Max retry attempts
FEED_RETRY_BASE_DELAY_MS1000Base delay between retries

Best Practices

  1. Monitor regularly: Check ingestion status daily via /ingestion/feeds
  2. Use appropriate concurrency: Start with defaults and increase if the database handles the load
  3. Review exceptions: Address data quality issues promptly to prevent backlog
  4. Check DLQ weekly: Review /admin/dlq for persistent failures
  5. Watch feed health: Stale watermark monitor will alert on feeds that stop running
  6. Test config changes: Verify schedule and concurrency changes in development first

Last Updated: March 2026

Was this helpful? If you have feedback or questions, please contact your administrator.