Ingestion Control Guide
This guide explains how to manage data ingestion, including pausing/resuming data collection, adjusting rate limiting, and monitoring ingestion status.
Overview
The platform automatically collects data from multiple external sources. Admins can control this process to:
- Pause ingestion during maintenance
- Resume ingestion after troubleshooting
- Adjust rate limiting to prevent API throttling
- Monitor ingestion health and status
- Enable/disable individual feed types
Data Sources & Schedules
| Source | Default Schedule | Cron Override | Description |
|---|---|---|---|
| FCC Political Files | Every 6 hours (at :45) | FCC_CRON | Broadcast ad disclosures (HTML scraping) |
| AdImpact | Hourly (at :15) | ADIMPACT_CRON | Industry ad intelligence (dual-feed) |
| FEC 24-Hour Reports | Every 4 hours | FEC_24H_CRON | Independent Expenditure filings |
| Meta Ads Library | Every 8 hours | META_ADS_CRON | Facebook/Instagram political ads |
| Google Political Ads | 3x daily (6 AM, 2 PM, 10 PM) | GOOGLE_POLITICAL_ADS_CRON | Google/YouTube ads (BigQuery) |
| State Press RSS | Every 6 hours | STATE_PRESS_RSS_CRON | State-level news mentions |
| Email Newsletters | Continuous (60s polling) | INGESTION_GMAIL_POLL_INTERVAL | Gmail API polling |
| Email (Rep Sheets) | Continuous (60s polling) | INGESTION_GMAIL_POLL_INTERVAL | Gmail API polling for rep data |
| Feed Retry Queue | Every 10 minutes | FEED_RETRY_PROCESS_CRON | Retries failed feed runs |
Pause/Resume Ingestion
Using the UI
- Navigate to the ingestion page (
/ingestion) - Look for the "Pause Ingestion" button in the top right
- Click to pause ingestion immediately
- The button changes to "Resume Ingestion" - click to resume
- The button is color-coded:
- Amber when ingestion is active (shows "Pause Ingestion")
- Green when ingestion is paused (shows "Resume Ingestion")
Using the API
Get ingestion status:
GET /api/ingestion/control
Response:
{
"paused": false,
"delayMs": 60000
}
Pause ingestion:
POST /api/ingestion/control
Body: {"action": "pause"}
Resume ingestion:
POST /api/ingestion/control
Body: {"action": "resume"}
Individual Feed Control
Enabling/Disabling Feeds
Each feed type can be individually enabled or disabled via environment variables:
| Feed | Enabled Variable | Default |
|---|---|---|
| Email Queue | EMAIL_QUEUE_ENABLED | true |
| FCC Coverage Monitoring | FCC_COVERAGE_MONITORING_ENABLED | true |
| Station Refresh | STATION_REFRESH_ENABLED | true |
| Social Monitor | SOCIAL_MONITOR_ENABLED | true |
| Social Creative Discovery | SOCIAL_CREATIVE_DISCOVERY_ENABLED | true |
| Creative Followup | CREATIVE_FOLLOWUP_ENABLED | true |
| Radio Creative Discovery | RADIO_CREATIVE_DISCOVERY_ENABLED | true |
| EOD Reports | EOD_REPORT_ENABLED | true |
| Discovery Refresh | DISCOVERY_REFRESH_ENABLED | true |
| Dedupe Auto-Resolver | DEDUPE_AUTO_RESOLVER_ENABLED | false |
| Pre-Buy Alerts | PRE_BUY_ALERTS_ENABLED | true |
| AdImpact Creative Download | ADIMPACT_CREATIVE_DOWNLOAD_ENABLED | true |
| AdImpact Transcription | ADIMPACT_CREATIVE_TRANSCRIBE_ENABLED | true |
| AdImpact Gap Detection | ADIMPACT_GAP_DETECTION_ENABLED | true |
| AdImpact Full Reconciliation | ADIMPACT_FULL_RECONCILIATION_ENABLED | true |
Triggering Manual Runs
From the ingestion feeds page (/ingestion/feeds):
- Find the feed you want to run
- Click the "Run Now" button
- The feed will execute immediately regardless of its cron schedule
- View results in the run history
Rate Limiting
Rate limiting controls the delay between processing each email during newsletter ingestion.
Default Settings
- Default delay: 1000ms (1 second between emails)
- Valid range: 100ms to 300000ms (5 minutes)
Adjusting Rate Limiting
Via API:
POST /api/ingestion/control
Body: {"delayMs": 10000}
Via Environment Variable:
NEWSLETTER_RATE_LIMIT_MS=1000
Feed Concurrency Controls
| Variable | Default | Purpose |
|---|---|---|
INGESTION_CAMPAIGN_CONCURRENCY | 2 | Max campaigns processed simultaneously |
INGESTION_FEED_CONCURRENCY | 3 | Max feeds per campaign simultaneously |
INGESTION_FETCH_CONCURRENCY | 5 | Max concurrent HTTP fetches |
FCC_STATION_CONCURRENCY | 3 | FCC station processing concurrency |
FCC_PDF_CONCURRENCY | 3 | FCC PDF download concurrency |
INGESTION_GMAIL_CONCURRENCY | 3 | Gmail API concurrency |
Safety Limits
| Variable | Default | Purpose |
|---|---|---|
INGESTION_FEED_RUN_TIMEOUT_MS | 1800000 (30 min) | Max time per feed run |
INGESTION_FEED_LOCK_TTL_SECONDS | 2700 (45 min) | Distributed lock TTL |
INGESTION_FETCH_MAX_BYTES | 10485760 (10 MB) | Max download size |
FCC_MAX_PAGINATION_PAGES | 50 | Max FCC pages |
FCC_MAX_RECORDS_PER_RUN | 50000 | Max FCC records |
INGESTION_MAX_CSV_RECORDS | 100000 | Max CSV rows |
Monitoring Ingestion Status
Ingestion Feeds Page (/ingestion/feeds)
View all feed types with:
- Status indicators (running, success, error, locked)
- Last run time and duration
- Registration status and scheduled cadence
- Error details for failed runs
- Lock status (prevents duplicate runs)
Check Recent Runs
- Navigate to Campaign → Admin → Ingestion
- View "Recent Runs" table:
- Green checkmark: Successful run
- Red X: Failed run
- Clock icon: Currently running
- Click on a run to see details:
- Records found
- Records added vs duplicates
- Errors encountered
Health Monitoring Workers
These background jobs automatically monitor ingestion health:
| Monitor | Schedule | What it Checks |
|---|---|---|
| Email Ingestion Watchdog | Every 2 min | Gmail polling is running |
| Queue Health Check | Every 15 min | Job queue health |
| Stale Watermark Monitor | Every 30 min | Feeds not running on schedule |
| DLQ Monitor | Every 15 min | Dead letter queue growth |
| MySQL Connection Monitor | Every minute | DB connection pool |
| Parser Success Rate | Daily at 8:15 AM | Email parser success rate |
When to Use Pause/Resume
Pause ingestion when:
- Troubleshooting data issues
- Performing database maintenance
- Experiencing rate limiting errors from external APIs
- Need to reduce system load during peak traffic
- Running manual data corrections
Resume ingestion when:
- Troubleshooting is complete
- Maintenance is finished
- Rate limits have reset
- Ready to process new data
Troubleshooting
Issue: Ingestion is paused but I didn't pause it
- Solution: Check if another admin paused it via API or UI
Issue: Feed shows "Locked" but isn't running
- Solution: Feed locks expire after 45 minutes. If a feed crashed mid-run, the lock will auto-expire. You can also check Redis for stale locks.
Issue: Getting rate limited by external API
- Solution: Reduce concurrency settings or increase the feed's cron interval
Issue: Gmail polling not starting
- Solution: Check that
REP_LABEL_IDandNEWSLETTER_LABEL_IDare set to valid Gmail Label IDs. Usenpx tsx src/workers/mailbox/label-utils.tsto list available labels.
Issue: Pause state resets unexpectedly
- Note: The pause state is in-memory and resets on server restart. This is by design to prevent ingestion from staying paused indefinitely.
Issue: Feed retry queue backing up
- Solution: Check the dead letter queue (
/admin/dlq) for persistent failures. Resolve root causes before retrying.
Environment Variables Summary
Core Ingestion
| Variable | Default | Description |
|---|---|---|
INGESTION_CAMPAIGN_CONCURRENCY | 2 | Campaigns processed in parallel |
INGESTION_FEED_CONCURRENCY | 3 | Feeds per campaign in parallel |
INGESTION_DEFAULT_CRON | 0 * * * * | Default feed schedule |
INGESTION_FEED_RUN_TIMEOUT_MS | 1800000 | Feed run timeout (30 min) |
INGESTION_FEED_LOCK_TTL_SECONDS | 2700 | Lock TTL (45 min) |
SCHEDULER_LOCK_ENABLED | true | Enable distributed locks |
Gmail Polling
| Variable | Default | Description |
|---|---|---|
INGESTION_GMAIL_POLL_INTERVAL | 60s | Poll frequency |
INGESTION_GMAIL_MAX_BATCH | 10 | Max emails per poll |
INGESTION_GMAIL_CONCURRENCY | 3 | Concurrent API calls |
NEWSLETTER_GMAIL_POLL_INTERVAL | 300000 | Newsletter poll (5 min) |
NEWSLETTER_GMAIL_MAX_BATCH | 10 | Newsletter batch size |
NEWSLETTER_RATE_LIMIT_MS | 1000 | Delay between emails |
Feed Retry
| Variable | Default | Description |
|---|---|---|
FEED_RETRY_ENABLED | true | Enable retry queue |
FEED_RETRY_PROCESS_CRON | */10 * * * * | Retry check schedule |
FEED_RETRY_MAX_RETRIES | 3 | Max retry attempts |
FEED_RETRY_BASE_DELAY_MS | 1000 | Base delay between retries |
Best Practices
- Monitor regularly: Check ingestion status daily via
/ingestion/feeds - Use appropriate concurrency: Start with defaults and increase if the database handles the load
- Review exceptions: Address data quality issues promptly to prevent backlog
- Check DLQ weekly: Review
/admin/dlqfor persistent failures - Watch feed health: Stale watermark monitor will alert on feeds that stop running
- Test config changes: Verify schedule and concurrency changes in development first
Last Updated: March 2026
Was this helpful? If you have feedback or questions, please contact your administrator.