Welcome to the 290th edition of Coding Jag brought to you by TestMu AI!👐
AI promised to save us hours, yet many teams feel busier than ever. Why? Writing prompts, reviewing outputs, and fixing mistakes add new layers of work. The real shift is not just efficiency. AI is unlocking new capabilities, letting individuals and small teams do work that once required entire departments.
This edition explores what is changing across AI agents, tools, and developer workflows. Google introduced Gemma 4 for on-device agents, GitHub is opening Copilot for Eclipse, and Deep Agents released major updates worth exploring. Builder 2.0 is rethinking collaborative AI development, while new ideas around multi-agent systems show where AI building is heading next.
You will also explore AI agents built for software testing, new approaches to Salesforce testing with AI, benchmarks comparing leading LLMs, the growing need for AI agent observability, and new thinking around the tools used to build agentic systems.
📬 Found something useful or interesting? Hit reply and let's share perspectives.
News
06 min
developers.googleblog.com
🤖 What if your phone could run a full AI agent without touching the cloud? The Google AI Edge Team announces Gemma 4, a family of open-weight models built for on-device agentic AI, supporting multi-step planning, native function calling, and 140+ languages without cloud dependency. Released under Apache 2.0, edge variants process 4,000 tokens in under 3 seconds, with Agent Skills fully on-device agentic workflows on mobile for the first time.
09 min
cloud.google.com
☁️ What happens when powerful AI meets enterprise-grade cloud infrastructure? Richard Seroter, Chief Evangelist at Google Cloud, announces Gemma 4 is now available on Google Cloud, bringing reasoning, function calling, and 256K context windows to Vertex AI, Cloud Run, and GKE. Built on Apache 2.0, it supports sovereign cloud deployments and pairs with the Agent Development Kit to help enterprises build and deploy AI agents securely.
11 min
devblogs.microsoft.com
🔓 Could open source be the key to making AI coding tools work for everyone? Jialuo Gan, Program Manager at Microsoft, announces that the GitHub Copilot for Eclipse plugin is going open source under the MIT license, hosted under the Microsoft organization on GitHub. The move invites the community to contribute, extend, and innovate, with plans to continue evolving Copilot features in Eclipse.
12 min
github.blog
🦆 Ever wish your AI coding agent could get a second opinion before making a mistake? Nick McKenna and Bartek Perz, Applied Researchers at GitHub, introduce Rubber Duck, an experimental feature in GitHub Copilot CLI that uses a second model from a different AI family to review the primary agent's plans and code. Pairing Claude Sonnet 4.6 with Rubber Duck running GPT-5.4 closes 74.7% of the performance gap between Sonnet and Opus, catching architectural flaws, silent bugs, and cross-file conflicts before they compound.
06 min
blog.langchain.com
🔗 Is blocking execution still the biggest bottleneck in multi-agent systems? The LangChain Team releases Deep Agents v0.5, introducing async non-blocking subagents that run independently in the background via the Agent Protocol, eliminating the bottleneck of blocking execution for long-running tasks. The update also expands multimodal filesystem support to PDFs, audio, and video, with file type detected automatically and content passed natively to the underlying model.
11 min
builder.io
🚀 Builder 2.0 is here! Steve Sewell from Builder.io announces the first collaborative AI development platform where engineers, designers, PMs, and QA all build together on real code in real time. Teams can trigger tasks from Slack or Jira, run hundreds of parallel agents, and ship through existing CI/CD pipelines, turning individual AI productivity into team-wide velocity.
AI
07 min
testmuai.com
Exploring AI agents for modern software quality? TestMu AI’s agentic ecosystem is built to support both QA and development teams. It brings together purpose-built agents that maintain different layers of quality, from test creation and execution to self-healing and validation, helping maintain consistency across the entire Software development lifecycle.
12 min
blog.agent.ai
⏳ Why does everyone feel busier after adopting AI? Whitney Hathcock from Agent.ai shares that efficiency gains get absorbed by rising expectations, while prompting, reviewing, and correcting outputs adds new overhead. She reframes AI's true value as a capability unlock, helping individuals and small businesses do work that was previously out of reach.
08 min
kanerika.com
🔍 Your AI agent passed every health check, and still got it wrong. Mangal Dwivedi shares that traditional monitoring fails AI agents because clean API responses can hide hallucinated outputs. He breaks down a four-layer observability stack covering tracing, logging, metrics, and evaluation, with Gartner predicting LLM observability will reach 50% of GenAI deployments by 2028.
08 min
claude.com
🤖 What if your coding agent could research, build, and review all at once? The Anthropic Team shares a practical guide on using subagents in Claude Code, isolated instances with their own context windows that handle parallel edits, research, and independent reviews without cluttering the main session. They cover four invocation methods, from conversational prompts to automated hooks, helping teams delegate deliberately.
06 min
databricks.com
🎯 What if advertisers could find audiences they didn't know existed? Bradley Munday and Tyler Hickey from Databricks share how a multi-agent system using Genie, an Affinity Agent, and Agent Bricks helps advertisers build audience segments in natural language, then automatically surfaces statistically validated behavioral patterns, collapsing traditional days-long planning cycles into a single conversation.
Automation
10 min
thegreenreport.blog
🧪 Still writing deterministic tests for non-deterministic AI features? Irfan Mujagic shares a practical framework for QA automation engineers, covering when to automate, when to use eval tools like PromptFoo and DeepEval with quality thresholds, when to test manually, and when to simply monitor production. The key shift is moving from binary assertions to graduated evaluations that define what "good enough" actually means.
08 min
testmuai.com
☁️ Testing Salesforce just got a whole lot smarter! Salman Khan from TestMu AI shares how you can now perform Salesforce testing in a more advanced way with KaneAI, letting you plan, author, and evolve test cases in plain English with no coding required, while covering key concepts like Apex, custom objects, data integrity, and best practices for handling rapid seasonal releases.
10 min
digital.ai
⚙️ Building an app developers love to test? Pradeep Kumar, Principal Technical Solutions Engineer at Digital.ai, shares a practical framework for designing automation-ready applications from day one, covering stable element identifiers with data-testid attributes, semantic HTML, observable loading states, predictable API responses, and test-friendly configuration modes that make automation faster and more reliable.
Tools
08 min
aimultiple.com
🏆 Which LLM actually performs best as an autonomous coding agent? Nazlı Şipi and Cem Dilmegani from AIMultiple share a benchmark of 13 LLMs across 10 real-world software development tasks, executing 300 automated validation steps per model. Claude Sonnet 4.5 and GPT-5.2 topped the rankings, with Claude achieving the highest UI success rate and most consistent results across both API logic and frontend integration.
07 min
testmuai.com
🛠️ Looking for the right generative AI tool but not sure where to begin? Saniya Gazala from TestMu AI carefully curates a ranked breakdown of 16 top generative AI tools by use case, from ChatGPT, Claude, and Gemini for text and code, to Midjourney and DALL-E 3 for visuals, covering key architectures like LLMs, diffusion models, and multimodal models to help you pick the right fit.
10 min
blog.n8n.io
🔄 Everything you knew about AI agent development tools needs an update. Andrew Green from n8n shares how capabilities like RAG, memory, and web search have become commoditized across most LLM services, making the old evaluation framework obsolete. He proposes a new framework focused on enterprise-readiness, coding flexibility, and purpose-built agentic logic rather than basic building blocks.
Video & Podcast
08 min
aiforhumans.show
🎧 In this episode of AI for Humans, Kevin Pereira and Gavin Purcell break down Anthropic's Claude Mythos, a powerful new model accidentally leaked in March 2026 that found thousands of critical vulnerabilities in every major OS and browser. Rather than a public release, Anthropic launched Project Glasswing, sharing Mythos with Apple, Google, Microsoft, and 40+ partners for defensive cybersecurity purposes only.
07 min
youtube.com
🎥 In this video, the AI For Humans channel walks through the 7 core capabilities of Claude Cowork, Anthropic's desktop app that turns Claude from a chatbot into a full local productivity system. From reading hundreds of local files and building persistent memory, to connecting Gmail, Notion, and Google Drive, creating reusable skills, and scheduling automated tasks like daily inbox triage, it is the practical starting point for anyone ready to go beyond Chat.
Events
11 min
testinguy.org
📅 TestingUy 2026, LATAM's most important software testing and quality event, returns on April 15 and 16 in Montevideo, Uruguay. Day 1 is fully online, and Day 2 brings in-person talks, workshops, open spaces, and networking at the Telecommunications Tower, with a focus on AI in testing, back-to-basics approaches, and leadership. Free to attend with prior registration.