GOOD RESEARCH REQUIRES GREAT DATA

Most researchers still collect and label their own datasets. It’s slow, messy, and often done by overworked grad students. We think you deserve better.

That's why we’re opening our data pipeline to the research community — for free — exclusively for code-related research.

Submit Your Research Idea

WHY WE'RE DOING THIS

Innovation shouldn’t require a billion‑dollar data budget. If your research pushes the frontier — code generation, reasoning, alignment, SWE‑bench‑style evaluation — we’ll help you design and build the dataset you need.

All we ask?

If your paper gets published, cite us for dataset support. That’s it. No contracts. No fine print. Just science and good manners.

Built by Practitioners

Curated by engineers who’ve shipped production‑grade AI code — with build logs, reasoning traces, validation harnesses, and error metadata.

Precision > Volume

Rubric‑based annotation and automated diff validation ensure each example improves reasoning fidelity — not just dataset size.

Reproducibility at Scale

Versioning with hashing, prompt templates, and deterministic sampling scripts — your ablations survive peer review.

Quality Metrics, Not Vibes

Inter‑annotator agreement, error distributions, and benchmark validation reports — cite empirical quality, not anecdotes.

Full Ownership

We don’t reuse your data or mix it into commercial training sets. Your dataset remains yours — from schema to sample.

Engineer‑Grade Delivery

Datasets come in JSONL / Parquet / HF‑ready formats with tests and version control.

WHAT YOU'll GET

A custom dataset built by engineers, fully tested and version-controlled. It’s clean, reproducible, and delivered in standard formats like JSONL, Hugging Face, or Parquet.

We tailor it to your research goals—fine-tuning, evaluation, or ablations—and you own it completely. All we ask is a citation or acknowledgment when you publish.

ML researchers working on code or reasoning tasks

Grad students writing their first (or fifth) paper

Independent researchers with big ideas and small budgets

HOW IT WORKS

You tell us your idea in detail

If it’s ethical, feasible, and interesting, we’ll reach out

We help you design and deliver the dataset

You publish your work and cite us

Submit Your Research

Talent

Technologies

LatAm Hiring Guides

Insights

GOOD RESEARCH REQUIRES GREAT DATA

Submit Your Research Idea

WHY WE'RE DOING THIS

All we ask?

WHY RESEARCHERS TRUST REVELO

Built by Practitioners

Precision > Volume

Reproducibility at Scale

Quality Metrics, Not Vibes

Full Ownership

Engineer‑Grade Delivery

WHAT YOU'll GET

WHO IT'S FOR

HOW IT WORKS

You tell us your idea in detail

If it’s ethical, feasible, and interesting, we’ll reach out

We help you design and deliver the dataset

You publish your work and cite us

ABOUT THIS PROGRAM (FAQ)

Wait, so it’s actually free?

What kind of projects do you accept?

Who owns the data?

How should I cite Revelo?

Why do you review ideas first?

Why are you doing this?