News data analysis: from headlines to structured signals
Analyzing news usually starts with a long scraping-and-cleaning slog before you reach any insight. If the data arrives already scored, classified and de-duplicated, you skip straight to the analysis. Here's what you can actually measure.
Analysis-ready fields
Every article comes with urgency (0–10), political_lean, topic_tags, country_tags, language, an event cluster_id and a UTC timestamp. There's no NLP pipeline to build — load the response into a DataFrame and start analysing.
curl -H "X-API-Key: YOUR_KEY" \ "https://api.newsagentdata.com/v1/feed?days=7&country=ua&topic=defense"
What you can measure
- Coverage volume over time by country/topic — agenda-setting and attention shifts.
- Lean distribution per event — group by
cluster_idto quantify framing across state/independent/Western sources. - Urgency spikes — detect breaking events from score plus
cluster_sizegrowth, not a keyword alert. - Cross-source comparison — the same story, told from different stances, side by side.
- Geographic & topic trends across the archive via the
dayswindow.
From API to notebook
Paginate the feed, store as JSONL, load into pandas, then group/pivot on country_tags, topic_tags, political_lean or cluster_id. Because urgency scoring is deterministic, a threshold means the same thing across your whole time series — so trend lines are comparable month to month. Historical and live rows share one schema (see the historical guide) and duplicate coverage is already collapsed (see event clustering).
Honest note
political_lean "neutral" = unclassified, and Russian/English are the deepest-enriched languages — factor that into any aggregate. The free tier (full schema, 100 requests/day) is enough to prototype an analysis end to end before scaling up.