SOLVING SUBJECTIVE QUALITY IN AI GENERATED DESIGN
Turning 'make it look good' into a repeatable process for AI-generated interface quality.
UI GENERATION BREAKTHROUGH
TLDR
A leading hyperscaler needed to improve their LLM's ability to generate visually appealing user interfaces, but faced a unique challenge: quality was subjective and assessed by rotating crowdsourced reviewers.
Revelo redesigned the workflow, introduced design validation checkpoints, and reduced AHT by 32% while increasing approval rates by 76%—scaling from 75 to 356 weekly tasks with 94% final quality scores.
THE CHALLENGE
When "visually appealing" means different things to every reviewer
Our client set out to evaluate their LLM's ability to generate and refine front-end web applications, with one critical requirement: the UIs had to be "visually appealing." This seemingly simple goal revealed three major obstacles:
Subjective Design Criteria: With quality assessed by a rotating pool of crowdsourced QA reviewers, "visually appealing" meant different things to different people. This lack of clear standards led to inconsistent feedback, frequent rework, and frustrated annotators.
Inflated Average Handling Time: The original workflow had annotators building fully functional React applications, only to have them rejected at the final stage for aesthetic reasons. Tasks that should have taken 4 hours were stretching to 7 hours due to multiple revision cycles.
Missing Design Alignment: The lean workflow skipped upfront design validation—a critical oversight. Developers were coding in the dark, guessing at aesthetic requirements that wouldn't be evaluated until the very end of the process.
OUR APPROACH
Beyond talent: Re-engineering the entire workflow for subjective quality
Revelo recognized that traditional approaches wouldn't work for subjective quality assessment. Instead of just providing talent, we became strategic partners in reimagining the entire workflow:
Pattern Recognition: We analyzed QA feedback to identify what was actually getting approved, creating internal guidance documents that decoded the unwritten aesthetic rules.
Workflow Experimentation: We ran three different workflows in parallel, testing various approaches to find the optimal balance between efficiency and quality.
Proactive Design Validation: We proposed adding a design review layer at the beginning of the process—getting aesthetic approval before any code was written.
Continuous Calibration: We built flexible review systems that could adapt in real-time to evolving QA patterns, turning ambiguity into actionable insights.
THE SOLUTION
A design-first workflow that turned aesthetic chaos into systematic success
Working closely with the client, we implemented a complete workflow transformation:
The Optimized Workflow:
- Designer Sourcing: Senior designers source and modify high-quality mockups aligned with emerging quality patterns
- Client Pre-approval: Designs are validated before any development begins, eliminating late-stage aesthetic rejections
- Focused Development: Developers build from pre-approved designs, working from a shared repository with clear specifications
- Layered Review: Separate design and functionality reviews ensure both aesthetic and technical quality
- Continuous Feedback Loop: Weekly calibration sessions between Revelo and client teams to refine standards
Key Innovations:
- Introduced Figma file support to improve developer efficiency
- Created internal calibration documents for consistent aesthetic standards
- Implemented peer checkpoints to maintain alignment without formal rubrics
- Built a feedback system that helped the client refine their own quality definitions
THE RESULTS
Dramatic improvements across efficiency, quality, and scale
The impact of our approach was dramatic and measurable:
Efficiency Gains:
- 32% reduction in AHT (from 420 to 231 minutes)
- 4.7x increase in weekly output (from 75.5 to 356 tasks)
- 2.3x growth in active specialists (from 70 to 158)
Quality Improvements:
- 76% increase in approval rates (from 54% to 94%)
- Consistent quality at scale across 12 weeks of production
- Sustainable workflow that the client could replicate for future projects
Long-term Impact: As a result of this success, the client expanded the partnership with Revelo, launching three additional UI generation initiatives. We had transformed a problematic project with subjective quality requirements into a scalable, efficient operation—proving that with the right approach, even the most ambiguous challenges can be systematically solved.
LET'S LEVEL UP YOUR LLM TODAY.
Improve your model's code generation with high-quality, code-focused human data.