Agentic AI Evaluation Landscape
Comprehensive taxonomy of 144 benchmarks across 17 capability dimensions (2024 -- March 2026)
Total Benchmarks
144
Across 16 categories
Sources Analyzed
83
Papers, blogs, threads
New in 2026
24
Jan -- Mar 2026
Capability Gaps
6
Zero-coverage dimensions
Researchers Tracked
18
Tier 1 & 2
Citation Graph
47
Papers, 67 edges
Last Updated
--
Checking...
Benchmarks by Category
Growth by Year
Capability Coverage
Top 10 Benchmarks by Relevance Score
| Rank | Benchmark | Year | Publisher | Citations | Score | Category |
|---|
Benchmark Registry
All 144 benchmarks searchable and filterable
| # | Benchmark | Publisher | Date | Venue | Tasks | Top Score | Category |
|---|
Capability Matrix
Top 40 benchmarks x 17 capability dimensions. ● = Primary ◑ = Secondary
Gap Analysis
Over-evaluated, under-evaluated, and zero-coverage capability dimensions
Complete Gaps (Zero Coverage)
Under-Evaluated
Over-Evaluated
Benchmark Saturation Timeline
| Benchmark | Introduced | Top Score | Status |
|---|
Source Reader
Browse 81 source summaries across arxiv, announcements, blog posts, and Twitter threads
Select a source to read
Citation Rankings
38 benchmarks ranked by composite relevance score (citations + recency + diversity + specificity)
| Rank | Benchmark | Year | Citations | Score | Tier | Primary Capabilities |
|---|
Trends & Insights
Key patterns and emerging frontiers in agentic evaluation
Publisher Distribution (Top 16)
Key Researcher Movements (2024-2026)
| Researcher | From | To | When |
|---|