Agentic Eval Taxonomy

Enter password to access the intake features, or browse read-only.

Agentic Eval Taxonomy

Agentic AI Evaluation Landscape

Comprehensive taxonomy of 144 benchmarks across 17 capability dimensions (2024 -- March 2026)

Total Benchmarks
144
Across 16 categories
Sources Analyzed
83
Papers, blogs, threads
New in 2026
24
Jan -- Mar 2026
Capability Gaps
6
Zero-coverage dimensions
Researchers Tracked
18
Tier 1 & 2
Citation Graph
47
Papers, 67 edges
Last Updated
--
Checking...

Benchmarks by Category

Growth by Year

Capability Coverage

Top 10 Benchmarks by Relevance Score

RankBenchmarkYearPublisherCitationsScoreCategory

Benchmark Registry

All 144 benchmarks searchable and filterable

# Benchmark Publisher Date Venue Tasks Top Score Category

Capability Matrix

Top 40 benchmarks x 17 capability dimensions. = Primary   = Secondary

Gap Analysis

Over-evaluated, under-evaluated, and zero-coverage capability dimensions

Complete Gaps (Zero Coverage)

Under-Evaluated

Over-Evaluated

Benchmark Saturation Timeline

BenchmarkIntroducedTop ScoreStatus

Source Reader

Browse 81 source summaries across arxiv, announcements, blog posts, and Twitter threads

All (81)
arxiv (15)
Announcements (15)
Blogs (25)
Twitter (26)
Select a source to read

Citation Rankings

38 benchmarks ranked by composite relevance score (citations + recency + diversity + specificity)

RankBenchmarkYearCitationsScoreTierPrimary Capabilities