Our Methodology
How we detect trends and synthesize AI news into coherent stories
Transparency is core to our mission. This page explains exactly how we detect trends, validate quality, and synthesize content from multiple sources. Our AI is a tool for synthesis, not replacement of human judgment.
Quality Gates
Not every cluster of articles becomes a trend. We enforce strict quality gates to ensure only coherent, specific, and newsworthy trends are published. Here's what we filter out:
Confidence Threshold
Trends must score 0.6+ confidence from our clustering algorithm. Low-confidence clusters are rejected.
Minimum Sources
At least 3 articles from 3 different outlets required. Single-source stories don't become trends.
Coherence Test
GPT validates if articles could share a magazine cover. Unrelated articles are filtered out.
Title Specificity
Generic titles like 'AI Industry Update' are rejected. Trends need specific actors, events, or themes.
Banned Phrases
Marketing buzzwords like 'game-changing', 'paradigm shift', or 'reaches critical mass' are blocked.
Deduplication
Same event from multiple outlets doesn't create duplicate trends. We merge into existing trends.
These gates mean we reject approximately 40% of detected clusters. Quality over quantity.
Trend Detection
We use machine learning to automatically identify emerging trends and group related content into coherent stories. This includes news, research papers, GitHub repos, and developer blogs - all unified by topic.
How It Works
- Embedding Generation: Each piece of content is converted into a semantic vector using OpenAI's text-embedding-3-small model (1536 dimensions).
- Clustering: We apply DBSCAN (Density-Based Spatial Clustering) to group content with similar embeddings. Content within a cosine similarity threshold of 0.75 is considered related.
- Story Synthesis: For each cluster, GPT analyzes all content types to generate a synthesized narrative - "The Story So Far" - that connects news, research, and code.
- Velocity Tracking: We track how fast trends are growing (articles per day) and classify them as Emerging, Growing, Peaking, or Declining.
Trend Lifecycle
New pattern detected
Gaining momentum
Maximum coverage
Fading attention
Content Sources
We aggregate content from four types of sources, unified into trends:
News
Company blogs (OpenAI, Anthropic, Google DeepMind, Meta AI, etc.) and tech publications (TechCrunch, The Verge, Wired).
Research Papers
Daily papers from ArXiv (cs.AI, cs.LG, cs.CL) and HuggingFace featured papers.
GitHub Repos
Trending AI/ML repositories with significant star counts and active development.
Dev Blogs
Technical posts from NVIDIA, Microsoft, and other AI labs explaining implementation details.
Collection Schedule
News is collected 3x daily, research papers daily at 6pm ET, and GitHub/dev blogs are scanned continuously. All content is deduplicated and matched to existing trends.
AI Transparency
We use GPT to generate summaries and synthesize "The Story So Far" narratives. This is clearly disclosed - our AI is a tool for synthesis, not a replacement for reading the original sources.