Agentic AI Evaluation Landscape
Comprehensive taxonomy of benchmarks across 17 capability dimensions (2024 – March 2026)
Benchmarks by Category
Growth by Year
Benchmark Growth Over Time
Capability Coverage
Latest Additions
Benchmark Registry
All benchmarks searchable and filterable
| # | Benchmark | Publisher | Date | Venue | Tasks | Top Score | Category | Imported | Analysis |
|---|
Capability Matrix
Benchmarks x 17 capability dimensions. ● = Primary ◑ = Secondary
Gap Analysis
Over-evaluated, under-evaluated, and zero-coverage capability dimensions
Complete Gaps (Zero Coverage)
Under-Evaluated
Over-Evaluated
Benchmark Saturation Timeline
| Benchmark | Introduced | Top Score | Status |
|---|
Source Reader
Browse source summaries across arxiv, announcements, blog posts, and Twitter threads
Citation Rankings
Benchmarks ranked by composite relevance score (citations + recency + diversity + specificity)
| Rank | Benchmark | Year | Citations | Score | Tier | Primary Capabilities |
|---|
Trends & Insights
Key patterns and emerging frontiers in agentic evaluation
Publisher Distribution (Top 16)
Monthly Publication Velocity
Category Growth Over Time
Key Researcher Movements (2024-2026)
| Researcher | From | To | When |
|---|
Month by Month Progression
Chronological view of benchmark publications and key developments
Special Reports
In-depth analyses of specific topics in the agentic evaluation landscape
Feature Requests
Track ideas and improvements for the dashboard