SESSION

Can You Trust Your ChatBot: Techniques for Testing LLM Responses

Like everyone and their mother, you now have a chatbot on your site. But can you trust it to give the right answers? Not insult the users? Not give them ingredients for a homemade exploding salad?

Testing something that gives you a whole lot of text, and never the same way twice. That's tough. But not impossible.

The never - What you never, ever want to see

The must - The beginning of a good answer

Golden Datasets - What good answers look like

Tone and bias detection - What proper answers look like

Scorecards - What the new "pass/fail" looks like

The AI Judge - Ask a smarter bot to settle the argument between you and your bot. What modern delegation looks like. Unfortunately.

All with live examples. If you're testing chatbots, AI agents, or just want to know if somebody's prompt will crash your system - this one's for you.

Yes, you can run a couple of examples and see if your chatbot behaves. But trust? That we need to build. So, let's make sure that bot doesn't get us on the news.

Key Takeaways:

  • Takeaway

    Because of indeterministic results, we need better testing.

  • Takeaway

    New techniques for automation and CI.

  • Takeaway

    We need to cover risks we're not used to dealing with (e.g. safety, security).

About the speaker

Gil Zilberfeld:

Gil Zilberfeld has been writing code and building software for over 25 years, starting with his trusty Sinclair ZX81. His career has taken him from developer to team leader and consultant, giving him a deep, holistic understanding of the software lifecycle. Today, he operates as an independent consultant and trainer under the brand TestinGil, where he helps R&D and QA professionals build better software from the inside out. Gil's core philosophy is that quality is a team sport. He is a benevolent skeptic of industry dogma, consistently challenging ineffective "best practices" in favor of pragmatic, context-driven solutions that work in the real world. He is a frequent speaker at international conferences, where he shares his expertise on topics ranging from TDD and clean code to the modern complexities of web automation. He is currently focused on creating actionable, engineering-led frameworks for ensuring the quality and long-term resilience of AI-powered systems. You can find his articles, videos, and courses at testingil.com. In his spare time, he shoots zombies for fun.

TESTMU-CONF 2026

GET YOUR FREE BOARDING PASS

I agree to TestMu AI's Privacy Policy, Conference Terms and Conditions.

About
TestMu Conf

Testμ (TestMu) is the world’s largest virtual conference on agentic engineering and quality, built by the community, for the community. As AI reshapes how we build, test, and ship software, Testμ Conf is where you connect, grow, and lead: agentic workflows, autonomous quality, battle-tested AI playbooks, hands-on workshops, and the engineering culture driving it all.