Welcome to the 287th edition of Coding Jag brought to you by TestMu AI!👐
The gap between AI agent capability and engineering confidence has never been wider. Agents are writing tests, flagging regressions, and making coverage decisions. Evaluating their reliability and monitoring production behavior is still catching up. This edition covers the full picture, from local agent setup to production observability.
The model layer also moved fast this week. Anthropic made 1M context available for Opus 4.6 and Sonnet 4.6. It ships at flat pricing with no long-context premium attached. OpenAI shipped GPT-5.4 Mini and Nano, optimized for subagent workloads.
This edition covers what practitioners are actively navigating right now. What to demand from AI agents before handing them your test suite, how to detect context overload in LLM pipelines before it becomes a failure, and much more.
📬 Found something useful or interesting? Hit reply and let's share perspectives.
News
08 min
devblogs.microsoft.com
🤖 PuiChee (PC) Chan and Travis Angevine introduce a new azd command that lets developers run and invoke AI agents locally before deploying to the cloud. The focus is on shortening the feedback loop for agent development. Test locally, catch failures early, ship with more confidence.
08 min
forbes.com
🛡️ This article by Margarita Simonova cuts through the hype and asks the harder question. Agentic QA is real, but auditability, explainability, and human-in-the-loop controls are still the conditions for trusting it. This is the checklist conversation engineering leaders need to have before handing over the keys.
11 min
grafana.com
📈 Trevor Jones presents results from Grafana's fourth annual Observability Survey. AI's role in observability is growing fast, but the concerns are real and measurable. Teams are adopting AI for anomaly detection and incident response, while trust and data quality gaps remain the top blockers. The potential is not theoretical. Neither are the hesitations.
07 min
claude.com
🧠 Anthropic makes 1M context GA for Claude Opus 4.6 and Sonnet 4.6 at standard pricing. No long-context premium, no beta header required. For teams running long agent sessions, full codebase analysis, or multi-document workflows, this removes the architectural workarounds that context limits forced on production systems.
08 min
openai.com
⚡ OpenAI releases two new models explicitly built for subagent workloads. GPT-5.4 mini runs more than 2x faster than GPT-5 mini while approaching GPT-5.4 benchmarks on coding tasks. Nano goes further down the cost curve for classification, data extraction, and lightweight coding support.
AI
11 min
srini.codes
📚 Srinivasan Sekar has a book out. The MCP Standard is a developer-focused guide to building with the Model Context Protocol, with a foreword by Angie Jones, VP of Engineering at Block. With MCP showing up across this entire edition, from agent debugging to agentic workflows to protocol guides, the timing is right. Worth adding to the list.
08 min
developers.googleblog.com
🗺️ Shubham Saboo and Kristopher Overholt map the landscape of AI agent protocols, covering how agents communicate, coordinate, and hand off context across multi-agent systems. As multi-agent systems become a production reality, protocol decisions made now will shape interoperability for years.
07 min
github.blog
♿ Carie Fisher outlines how it uses AI to close the loop between user accessibility feedback and product changes continuously rather than periodically. The model is closer to continuous testing than periodic audits. Testers building accessibility coverage into pipelines will find the workflow logic transferable.
12 min
testmuai.com
🧪 Salman Khan lays out a structured learning path for software testers navigating the AI shift. Rather than a general overview, it maps specific skills, tools, and transitions by role and experience level. Concrete enough to act on, broad enough to orient the whole team.
Automation
07 min
testdino.com
🎭 Pratik Patel shows why AI agents write passable Playwright tests out of the box but fall apart on real-world sites with dynamic elements, complex selectors, and edge-case flows. Skills are reusable instruction sets that teach agents the patterns your codebase actually needs.
06 min
thegreenreport.blog
🔍 Irfan Mujagic describes a testing problem that produces no obvious failure signal. Feed an LLM too much context, and it quietly drops information. The response returns 200, the JSON parses fine, and the answer is wrong. This piece walks through a practical CI-ready automation test to catch context overload.
11 min
sonarsource.com
🛠️ Ekaterina Okuneva looks at what breaks when AI-generated code enters production pipelines at scale. Static analysis assumptions change when the generation volume goes up. The goal is not to slow AI down. It is making AI output trustworthy.
11 min
langwatch.ai
👁️ Rogerio Chaves makes a practical case for AI-assisted agent debugging. When a customer reports an issue, instead of manually searching traces, you ask your bot. LangWatch exposes an MCP server that lets AI assistants query trace data directly. The setup covers search, correlation, and pattern recognition across production runs.
Tools
09 min
medium.com
🧠 This article by unicodeveloper breaks down the skill layer for coding agents, covering what Claude, Cursor, Gemini CLI, and similar tools need to perform reliably in real codebases. The list moves past generic advice into specific, installable capabilities. Useful context for anyone building with or evaluating AI coding agents this year.
07 min
blog.n8n.io
🔗 Mihai Farcas addresses the gap between local MCP demos and production agent systems. MCP connections that work in your IDE disappear when you close your laptop. This guide categorizes the top 20 MCP servers and shows how to orchestrate them with n8n to build persistent, reliable agentic workflows.
Video & Podcast
12 min
ministryoftesting.com
🎧 Simon Tomes and Demi Van Malcot return for another episode of This Week in Testing, covering what quality actually looks like in practice and how testers navigate conversations with developers that require both precision and tact.
08 min
youtube.com
🎥 Part 2 of this tutorial by The Testing Academy goes deeper into building reliable retrieval-augmented generation pipelines with quality assurance in mind. It covers how to evaluate retrieval accuracy, structure test cases for RAG outputs, and catch failure modes before they reach production.
Events
06 min
testingmind.com
🎟️ Join Test Automation Summit Portland on March 25, 2026. The theme this year is Human-Centric Quality: AI-Driven, Sustainable, and Secure Testing for the Digital Era. A full-day in-person event bringing together testing and automation practitioners to share real-world lessons on building QA practices that scale alongside AI.