Neal Desai

Product & Engineering · builds things

GH LI

01 EscalationBenchResearch preview Every agent benchmark rewards autonomous completion; production needs the opposite — agents that recognize when context, authority, or policy is insufficient and stop to ask, verify, or hand off rather than proceed unsafely. EscalationBench scores both over- and under-asking across real business workflows, beyond code and SQL. Evals · Agents · LLMs · Python Read the writeup →

02 Generating Eval Data with an Agentic Reflection LoopComing soon How EscalationBench's tasks get made — an agentic loop, built with Claude Code, that drafts a task, critiques and reflects on its own output, and repairs until it clears each quality gate, with a human review as the final gate. The methodology, the tradeoffs, and what I'd change. Essay · Synthetic Data · Agents · Claude Code

03 AI Transformation & the Context ProblemComing soon Claude Cowork, TextQL, and the new wave of enterprise copilots make it easy to Q&A your data — until you hit the real bottleneck: messy, ungoverned, context-poor data. A field guide to enterprise AI enablement that treats the data-and-context problem, not the demo, as the main event. Essay · Enterprise · Enablement · Data

04 State of the Startup Union: Reading the CohortsComing soon Hours of long-form startup video — YC, Dwarkesh, and other accelerator and VC channels — hide the signal. Using the Offtake platform to transcribe and surface the most relevant clips, then mapping YC's cohorts over time: who broke out, bucketed by outcome and category, what that says about where the industry is heading, and a VC-forward read on what founders will need next. Essay · Video Intelligence · Offtake · YC Cohorts · Trends

05 How Offtake Clips Video by TranscriptComing soon OpusClip does this too — here's how we built it on Offtake: drop a video, transcribe it, and clip it straight from the transcript. The design decisions behind transcript-driven clipping, and how it powers the startup-video analysis above. Essay · Offtake · Video · Systems

I'm a Staff AI Architect at Scale AI, a pre-sales role working with research teams at frontier AI labs on reinforcement-learning environments, fine-tuning data strategy, and model evaluations. Much of the work is scoping and building data products and the unique insights that come with them, from studying where models fall short and closing the gap.

Before Scale I was a startup co-founder and CTO, built production ML at Beyond Limits for enterprise customers in energy, finance, and healthcare, and started out at Epic Systems shipping predictive models for hospitals. I studied biological engineering and applied math at Caltech and machine learning at UW-Madison. I'm based in New York. Outside of work I mentor high school students through their first real research projects in healthcare and tech, which is some of the most rewarding work I do. The rest of the time you'll usually find me traveling or outdoors.

Years building

Projects shipped

Articles written

Technologies

Have an idea, a question, or just want to say hi? I'd love to hear from you. Let's build something.