When You Run Out of Requirements: What Happens When You Go All-In on Agents
Six months ago, our engineering team went all-in on agents. Claude Code for autonomous builds. Cursor for daily development. Custom skills tuned for HIPAA and clinical edge cases. Agent-first architecture from day one. We invested heavily in harness engineering — the scaffolding around the agents that makes them actually trustworthy in production.
The result: feature cycle time dropped 99%, from days to minutes. PR throughput went 4–5x. Sustained, in production, over a full quarter. That part went exactly as we'd hoped.
What we didn't expect: the bottleneck moved. Past testing. Past development. Past design. All the way upstream to product and UX. Our agents could build features faster than our product team could define them. The constraint that used to live in QA now lives in what should we build, and how should it feel.
This talk is about everything that comes after that realization. The agentic stack we actually run in production healthcare. The framework we use to build LLM-powered features and ship them safely. What QA looks like when humans don't execute tests anymore — and why that's the most strategic version of the role we've ever had. What it means to pair with AI at every step of the SDLC, not just inside the IDE. And the messy reality of what it takes to get a team — and a company — to operate this way.
No theater. Production numbers. The mistakes we made and the playbook we ended up with. Walk away knowing exactly what to try Monday.
Key Takeaways:
The agentic stack, end to end — what we actually run: Claude Code, Cursor, custom skills, harness engineering, agent-first architecture. Why each piece is there and what it would cost you to skip it.
The bottleneck shift is real — when agents handle the build, product and UX become the constraint. Why a 99% cycle-time drop only buys 4–5x throughput, and what teams need to do upstream so engineering doesn't sit idle.
Building LLM products without theater — the framework for shipping AI-powered features in a regulated environment: context architecture, risk-tiered verification, evaluation as a first-class discipline.
QA's most strategic moment — from test execution to quality system architecture. The four pillars humans own when agents do the writing.
Pair with AI at every step — a tactical playbook for getting a team to days-to-minutes cycle time without losing quality, judgment, or your engineers' brains.
About the speaker
Rashi Agrawal:
Rashi Agrawal is the Head of Agentic AI at Hinge Health, where she leads the strategic engineering of high-stakes AI systems within complex regulatory landscapes. By navigating the rigorous requirements of clinical safety, HIPAA, and global regulations, she ensures that disruptive technology remains both secure and compliant. Focused on the pioneer side of generative technology, Rashi is architecting "state-of-the-art" frameworks for Agentic AI that move beyond simple automation to solve critical member problems and drive dramatic business growth. Formerly the Head of AI at GoodLeap, she spearheaded enterprise-wide AI transformation initiatives, and earlier led engineering teams at Yahoo. A global explorer who has traveled to over 45 countries, she is an active thought leader in the Engineering Leadership and AI community, holds a Master's in Computer Science from San Jose State University, and is the founder of Women In Tech AI (WIT AI).
About
TestMu Conf
Testμ (TestMu) is the world’s largest virtual conference on agentic engineering and quality, built by the community, for the community. As AI reshapes how we build, test, and ship software, Testμ Conf is where you connect, grow, and lead: agentic workflows, autonomous quality, battle-tested AI playbooks, hands-on workshops, and the engineering culture driving it all.