SCALING AGENTIC AI PERFORMANCE IN WEEKS WITH FULL-TRACE DATA
How a thought-partner approach and bespoke workflow solved a novel data collection challenge for a hyper-scale AI leader.
AGENTIC TRAINING BREAKTHROUGH
TLDR
A hyperscaler client developing a sophisticated coding agent needed a way to collect highly detailed, trace-based data to advance its model's capabilities, particularly on benchmarks like SWE-bench. The challenge was creating a novel data collection workflow from scratch, sourcing elite developers with niche repository experience, and keeping them engaged in complex, time-consuming tasks.
Revelo acted as a dedicated thought partner, co-designing a multi-platform solution to capture not just code, but the developer's step-by-step reasoning and actions. By combining our proprietary platform, a tailored incentive structure, and access to a network of 400,000+ vetted engineers, we delivered a scalable and efficient solution that produced meaningful model improvements.
The results included a 10x scaling of trace collections in just two months, a 48% reduction in the average time to complete tasks, and a 20% increase in data quality. The project's success led the client to expand the partnership with three new initiatives.
THE CHALLENGE
Capturing Real World Data at Scale
To push the boundaries of AI code generation, the client needed to train its model on more than just code; it needed to understand the entire problem-solving process. The objective was to obtain "trace data," a complete record of a developer's live actions and thought processes while resolving real-world GitHub issues.
This presented several core challenges:
- Workflow Design: The collection method was entirely new and had to be built from the ground up, requiring close collaboration to find the optimal process for capturing high-quality data without frustrating developers.
- Specialized Talent: The project required senior Python developers who not only had experience with specific open-source repositories like Pandas but also niche ones like Qiskit. The ideal candidate profile was initially unclear.
- Complex Tooling: Developers had to navigate a multi-platform environment, using the client's proprietary tools, Revelo's HData platform, and their own local coding environment simultaneously.
- Sustaining Engagement: The tasks were technically demanding and had a high Average Handle Time (AHT), making it difficult to maintain the engagement of top-performing developers, which was critical for delivering quality data at scale.
OUR APPROACH
Partner-Driven Innovation
Revelo moved beyond the role of a typical vendor to become a true thought partner. Our approach was built on a foundation of collaborative problem-solving and deep expertise in software engineering. We worked hand-in-hand with the client to design, test, and refine a bespoke collection program tailored to their unique needs.
Our strategy was centered on four key principles:
- Iterative Process Development: We embraced experimentation, working with the client to iterate on the workflow and tooling to achieve the ideal balance between data quality, task completion rates, and developer engagement.
- Targeted Talent Sourcing: We leveraged our network of 400,000+ pre-vetted, Latin America-based software engineers to pinpoint the exact talent needed. This allowed us to find senior developers with experience in the specific, and often niche, GitHub repositories required for the project.
- Incentive Engineering: Recognizing the complexity of the tasks, we designed and A/B tested a multi-layered incentive structure. The goal was to reward top performers for quality, efficiency, and focus on priority tasks, ensuring their continued engagement.
- Integrated Feedback Loops: We established a seamless, multi-platform system that allowed us to collect data, track progress, and gather feedback without interrupting the developer's flow.
THE SOLUTION
An End-to-End System for High-Fidelity Data
The solution was a comprehensive, multi-platform workflow meticulously designed to capture reasoning-rich trace data. Revelo’s HData platform served as the central hub, tracking labeler engagement and synchronizing data across all systems via unique identifiers.
The developer workflow included:
- Environment Setup: Participants received Dockerfiles to replicate library states but often had to fix them to proceed. Revelo streamlined this by providing a library of pre-revised Dockerfiles.
- Test-Driven Development: Starting with failing tests to replicate an issue, developers fixed the repository issues and used Test-Driven Development (TDD) to ensure all tests passed before submission.
- Trace & Thought Capture: Using a custom VSCode extension, developers recorded their actions while simultaneously documenting their reasoning in natural language at key moments in the process.
To ensure the program could scale effectively, we implemented a sophisticated incentive structure that aligned payment with performance. This included:
- Quality-Based Rate Adjustments: Pay scaled with the quality of the submitted work, ensuring the client only paid for high-value data.
- Targeted Rate Increases: We offered higher rates for priority repositories and for faster, high-quality task completion to steer efforts where they were needed most.
- Engagement Bonuses: Top-performing contributors earned additional incentives based on the volume of tasks completed, rewarding their commitment.
THE RESULTS
Scaling Quality and Efficiency
Our partnership-driven approach delivered significant, measurable results that exceeded the client's expectations. The program successfully scaled from a proof-of-concept to a large-scale collection effort in a matter of months.
- 10x Growth in Volume: Daily trace collections scaled from 10 to 100 in just two months.
- 48% Reduction in AHT: Process improvements and tooling enhancements implemented after the initial phase cut the Average Handle Time nearly in half, from 330 minutes to 172 minutes.
- 20% Increase in Quality: The average quality rating for trace data increased from 70% to over 82% as the program scaled.
- Expanded Technical Scope: The number of active repositories grew from 4 to 30, demonstrating the breadth of expertise within Revelo's talent network.
The project's overwhelming success solidified Revelo's position as a trusted thought-partner, leading the client to deepen the relationship and initiate three new human data projects with us.
LET'S LEVEL UP YOUR LLM TODAY.
Improve your model's code generation with high-quality, code-focused human data.