Agentic AI Evaluation Taxonomy

Agentic AI Evaluation Landscape

Comprehensive taxonomy of benchmarks across 17 capability dimensions (2024 – March 2026)

Total Benchmarks

--

Across 17 categories

Sources Analyzed

--

Papers, blogs, threads

New in 2026

--

Jan – Mar 2026

Capability Gaps

--

Zero-coverage dimensions

Researchers Tracked

18

Tier 1 & 2

Citation Graph

--

Papers traced

Last Updated

--

Checking...

Benchmarks by Category

Growth by Year

Benchmark Growth Over Time

Capability Coverage

Latest Additions

Benchmark Registry

All benchmarks searchable and filterable

#	Benchmark	Publisher	Date	Venue	Tasks	Top Score	Category	Imported	Analysis

Capability Matrix

Benchmarks x 17 capability dimensions. ● = Primary ◑ = Secondary

Gap Analysis

Over-evaluated, under-evaluated, and zero-coverage capability dimensions

Complete Gaps (Zero Coverage)

Under-Evaluated

Over-Evaluated

Benchmark Saturation Timeline

Benchmark	Introduced	Top Score	Status

Source Reader

Browse source summaries across arxiv, announcements, blog posts, and Twitter threads

Select a source to read

Citation Rankings

Benchmarks ranked by composite relevance score (citations + recency + diversity + specificity)

Rank	Benchmark	Year	Citations	Score	Tier	Primary Capabilities

Trends & Insights

Key patterns and emerging frontiers in agentic evaluation

Publisher Distribution (Top 16)

Monthly Publication Velocity

Category Growth Over Time

Key Researcher Movements (2024-2026)

Researcher	From	To	When

Month by Month Progression

Chronological view of benchmark publications and key developments

Special Reports

In-depth analyses of specific topics in the agentic evaluation landscape

Feature Requests

Track ideas and improvements for the dashboard

Loading feature requests…

Agentic Eval Taxonomy

Agentic AI Evaluation Landscape

Benchmarks by Category

Growth by Year

Benchmark Growth Over Time

Capability Coverage

Latest Additions

Processing Queue

Benchmark Registry

Capability Matrix

Gap Analysis

Complete Gaps (Zero Coverage)

Under-Evaluated

Over-Evaluated

Benchmark Saturation Timeline

Source Reader

Citation Rankings

Trends & Insights

Publisher Distribution (Top 16)

Monthly Publication Velocity

Category Growth Over Time

Key Researcher Movements (2024-2026)

Month by Month Progression

Special Reports

Feature Requests

Agentic Eval Taxonomy

Add Source

Agentic Eval Taxonomy

Agentic AI Evaluation Landscape

Benchmarks by Category

Growth by Year

Benchmark Growth Over Time

Capability Coverage

Latest Additions

Processing Queue

Benchmark Registry

Capability Matrix

Gap Analysis

Complete Gaps (Zero Coverage)

Under-Evaluated

Over-Evaluated

Benchmark Saturation Timeline

Source Reader

Citation Rankings

Trends & Insights

Publisher Distribution (Top 16)

Monthly Publication Velocity

Category Growth Over Time

Key Researcher Movements (2024-2026)

Month by Month Progression

Special Reports

Feature Requests