Our Methodology

How we detect trends and synthesize AI news into coherent stories

Transparency is core to our mission. This page explains exactly how we detect trends, validate quality, and synthesize content from multiple sources. Our AI is a tool for synthesis, not replacement of human judgment.

Quality Gates

Not every cluster of articles becomes a trend. We enforce strict quality gates to ensure only coherent, specific, and newsworthy trends are published. Here's what we filter out:

Confidence Threshold

Trends must score 0.6+ confidence from our clustering algorithm. Low-confidence clusters are rejected.

Minimum Sources

At least 3 articles from 3 different outlets required. Single-source stories don't become trends.

Coherence Test

GPT validates if articles could share a magazine cover. Unrelated articles are filtered out.

Title Specificity

Generic titles like 'AI Industry Update' are rejected. Trends need specific actors, events, or themes.

Banned Phrases

Marketing buzzwords like 'game-changing', 'paradigm shift', or 'reaches critical mass' are blocked.

Deduplication

Same event from multiple outlets doesn't create duplicate trends. We merge into existing trends.

These gates mean we reject approximately 40% of detected clusters. Quality over quantity.

Trend Detection

We use machine learning to automatically identify emerging trends and group related content into coherent stories. This includes news, research papers, GitHub repos, and developer blogs - all unified by topic.

How It Works

Embedding Generation: Each piece of content is converted into a semantic vector using OpenAI's text-embedding-3-small model (1536 dimensions).
Clustering: We apply DBSCAN (Density-Based Spatial Clustering) to group content with similar embeddings. Content within a cosine similarity threshold of 0.75 is considered related.
Story Synthesis: For each cluster, GPT analyzes all content types to generate a synthesized narrative - "The Story So Far" - that connects news, research, and code.
Velocity Tracking: We track how fast trends are growing (articles per day) and classify them as Emerging, Growing, Peaking, or Declining.

Trend Lifecycle

Emerging

New pattern detected

Growing

Gaining momentum

Peaking

Maximum coverage

Declining

Fading attention

Content Sources

We aggregate content from four types of sources, unified into trends:

News

Company blogs (OpenAI, Anthropic, Google DeepMind, Meta AI, etc.) and tech publications (TechCrunch, The Verge, Wired).

Research Papers

Daily papers from ArXiv (cs.AI, cs.LG, cs.CL) and HuggingFace featured papers.

GitHub Repos

Trending AI/ML repositories with significant star counts and active development.

Dev Blogs

Technical posts from NVIDIA, Microsoft, and other AI labs explaining implementation details.

Collection Schedule

News is collected 3x daily, research papers daily at 6pm ET, and GitHub/dev blogs are scanned continuously. All content is deduplicated and matched to existing trends.

AI Transparency

We use GPT to generate summaries and synthesize "The Story So Far" narratives. This is clearly disclosed - our AI is a tool for synthesis, not a replacement for reading the original sources.

About Race to AGI Explore the Platform