ACCESS FRONTIER CODE DATASETS

Turbocharge your LLM development program by requesting high-quality datasets curated by Human Data experts at the frontier of code generation — the same experts trusted by leading AI labs to advance their model capabilities.

Once you submit the form, we’ll connect you directly with our team to align on requirements, share samples and, if needed, adapt formatting.

 

 

logos new HD

 

Request Dataset

APPLICATIONS OF REVELO CODE DATASETS & EVALS

Across every application — from training to evaluation — Revelo acts as your thought partner. We adapt to your team’s needs with flexible workflows that can run on your platform or ours, ensuring seamless integration and consistent quality at scale.

SWE-Bench Focused

Purpose-built datasets and traces optimized for SWE-Bench and derivative tasks, enabling precise evaluation of reasoning, bug-fixing, and code repair capabilities.

Terminal Bench

End-to-end coding environments that capture tool use, shell interactions, and real execution feedback — measuring true agentic performance, not just static output.

Tau Bench

Comprehensive multi-domain evaluations targeting long-context reasoning, compositional logic, and adaptive problem-solving across diverse code types.

Cross-Model Evals

Standardized, high-agreement frameworks to benchmark reasoning fidelity and regression across models, releases, and fine-tuning iterations.

Front-End (UI)

UI-grounded code generation tasks that assess a model’s ability to translate prompts, Figma specs, or component logic into functional, production-grade interfaces.

Custom

Custom-built datasets and evaluation suites for specialized architectures, modalities, or domains — co-designed with your research team to push the frontier of model capability.