CODING JAG - Issue 292

Welcome to the 292nd edition of Coding Jag brought to you by TestMu AI!๐Ÿ‘

Testing AI agents is not like testing regular software. You cannot just check if input X produces output Y. Agents make decisions, call tools, and chain steps together, which means the same prompt can produce a different result every single time. That makes evaluation harder, slower, and a lot more important to get right.

This week's edition covers all of that and more. See how Google rebuilds a fragile monolithic agent using ADK. Get the latest from Anthropic as they redesign Claude Code for parallel work. Learn how TestMu AI helped Bajaj Finserv cut test execution time by 70% using Hyperexcute and bring escaped defects below 3%, and is available on Microsoft Marketplace.

You will also find out how MCP cuts CI triage time from an hour to seconds, why evaluating AI agents needs more than one method, and how vibe testing with Playwright and Claude is changing the way teams validate UX and more.

๐Ÿ“ฌ Come across something useful or interesting? Just reply and let's exchange ideas.

News

TestMu AI HyperExecute Earns a 5/5 on Microsoft Marketplace

08 minChrome-Extensiontechcommunity.microsoft.com

Abhijeet Teware, Head of QA at Bajaj Finserv, rates HyperExecute 5/5 on the Microsoft Marketplace blog, calling out real device coverage, SmartUI for visual regression, and proactive support as standout reasons. His advice: don't start small and scale later, start at scale.

Production-Ready AI Agents: 5 Lessons from Refactoring a Monolith

08 minChrome-Extensiondevelopers.googleblog.com

๐Ÿ—๏ธ Luis Sala, Jacob Badish, and Frank Guan at Google tear apart "Titanium," a slow, fragile sales research agent, and rebuild it using Google's ADK. Big takeaways: break big tasks into smaller agents, drop hardcoded JSON for Pydantic, use dynamic RAG instead of fixed data, and add OpenTelemetry so you can actually see what's going wrong.

Announcing an Open-Source Collaboration: TestMu AI ร— Kan

08 minChrome-Extensiongithub.com

TestMu AI (formerly LambdaTest) has announced a collaboration with Kan, a modern open-source project management alternative to Trello. Through this partnership, TestMu AI will bring AI-powered testing capabilities to enhance Kanโ€™s reliability, streamline end-to-end and API testing workflows, and enable faster, more confident development for its growing community.

How Bajaj Finserv Cut Test Execution Time by 60% and Brought Escaped Defects Below 3%

12 minChrome-Extensiontestmuai.com

๐Ÿฆ Bajaj Finserv teamed up with TestMu AI to fix slow, manual testing across their mobile platform. Using HyperExecute and a real device cloud of 10,000 plus devices, they cut test execution time by 60%, scaled automation 40x, and pushed escaped defects below 3%, going from ad-hoc releases to confident weekly deployments.

AI infrastructure at Next โ€˜26

09 minChrome-Extensioncloud.google.com

โ˜๏ธ Amin Vahdat and Mark Lohmeyer at Google Cloud unveil a big set of AI infrastructure upgrades at Next '26, including 8th-gen TPUs, NVIDIA Vera Rubin-powered A5X instances, a new data center network called Virgo, and GKE updates built for agent workloads. All designed to run AI agents faster, more cost-effectively, and at massive scale.

Azure SDK Release (April 2026)

11 minChrome-Extensiondevblogs.microsoft.com

๐Ÿ› ๏ธ Ronnie Geraghty at Microsoft rounds up April's Azure SDK updates. Highlights include a critical security fix in Cosmos DB 4.79.0 that patches a remote code execution vulnerability, plus the stable launch of AI Foundry 2.0.0 with a cleaner architecture and separate namespaces for evaluations and memory operations.

Redesigning Claude Code on Desktop for Parallel Agents

07 minChrome-Extensionclaude.com

โšก Anthropic gives the Claude Code desktop app a full makeover, built for running multiple tasks at once. The update brings a new session sidebar, drag-and-drop pane layout, built-in terminal, in-app file editor, and a faster diff viewer. Developers can now juggle bug fixes, refactors, and test runs across repos without switching windows.

AI

How To Evaluate the Performance of AI Agents?

09 minChrome-Extensionblog.n8n.io

๐Ÿ“ Yulia Dmitrievna at n8n explains why testing AI agents is much harder than testing regular software. Since agents are non-deterministic and work across many steps, one method is not enough. The post walks through offline vs online evaluation, plus when to use deterministic checks, LLM-as-a-judge, and human review to catch what slips through.

From CI Failure to Root Cause in Seconds: MCP for QA Engineers

06 minChrome-Extensionthegreenreport.blog

๐Ÿ” Irfan Mujagic at The Green Report shows how to build an MCP server that connects an AI agent directly to your CI pipeline. Instead of spending an hour reading logs across multiple tools, the agent calls five tools in sequence, spots patterns, and hands you a prioritized list of root causes and fixes in seconds.

Vibe Testing with Playwright MCP: Testing UX with AI Agents

07 minChrome-Extensiontestmuai.com

๐Ÿงช Mohammad Faisal Khatri at TestMu AI walks through how to do vibe testing using Playwright MCP and Claude. Instead of writing scripts manually, you describe a full user journey in plain English and let the AI agent run it live in the browser, making UX validation faster and much easier for QA teams to adopt.

Monitor OpenAI API Costs with OpenTelemetry

12 minChrome-Extensionopenobserve.ai

๐Ÿ“Š Gorakhnath Yadav at OpenObserve shares a hands-on guide to tracking OpenAI API spend using OpenTelemetry. It covers how to capture token usage, break down costs by user and feature, build a cost dashboard in OpenObserve, and set up alerts before your API bill gets out of hand. A practical read for any team running LLMs in production.

Automation

How to Build an Agentic Operating System Inside Claude Code

11 minChrome-Extensionmindstudio.ai

๐Ÿค– The MindStudio team shows how to ditch tools like OpenClaw and Hermes and build your own agentic OS natively inside Claude Code. The setup gives your AI four key things: persistent memory, modular skills, a self-improvement loop, and scheduled execution so agents keep working even when you are not around.

Tools

The 18 Best AI Platforms in 2026 โ€“ Tested & Reviewed

11 minChrome-Extensionlindy.ai

๐Ÿ—๏ธ Marvin Aziz and Jack Jundanian at Lindy test and review 18 AI platforms across categories like content generation, coding assistants, customer support, automation, and analytics. The guide breaks down pricing, flexibility, and real-world use cases to help individuals, startups, and enterprise teams pick the right platform for their needs.

GPT-5.3 Codex Spark vs Claude Opus 4.6: Which Coding AI Wins?

08 minChrome-Extensiontestmuai.com

โš™๏ธ Deepak Sharma at TestMu AI puts GPT-5.3 Codex Spark and Claude Opus 4.6 head-to-head. Codex Spark wins on speed, making it great for quick iterations and real-time coding. Opus 4.6 wins on depth, handling complex reasoning, long contexts, and multi-step workflows. The verdict: they are built for different jobs, not the same one.

Video & Podcast

ChatGPT App Store Is the New Apple App Store

12 minChrome-Extensionyoutube.com

๐ŸŽ™๏ธ In this podcast of the AI Agents Podcast, host Demetri Panici sits down with Jotform Founder and CEO Aytekin Tank to talk about the soft launch of the ChatGPT App Store. They dig into why it has not made big headlines yet, and why the opportunity for developers and businesses to reach hundreds of millions of users is bigger than it looks right now.

I Built an AI Agent in 20 Minutes - Here's How

11 minChrome-Extensionyoutube.com

๐Ÿ“บ In this video, Tim from TechWithTim walks through how he set up MaxClaw, a MiniMax-powered AI agent platform, in just 20 minutes. He shows how pre-built experts handle tasks like trend tracking, multi-source research, and content generation without any complex setup, making it accessible enough for complete beginners to use daily from their phones.

Events

StarEast

07 minChrome-Extensionstareast.techwell.com

๐ŸŽค STAREAST 2026 runs from April 26 to May 1 at the Rosen Center Hotel in Orlando, FL, with a virtual option too. The event features 75+ talks covering AI in testing, test automation, security, and quality leadership. It is one of the biggest gatherings for software testers and QA professionals in the industry.