CODING JAG - Issue 290

News
AI
Automation
Tools
Video & Podcast
Events

Welcome to the 290th edition of Coding Jag brought to you by TestMu AI!👐

AI promised to save us hours, yet many teams feel busier than ever. Why? Writing prompts, reviewing outputs, and fixing mistakes add new layers of work. The real shift is not just efficiency. AI is unlocking new capabilities, letting individuals and small teams do work that once required entire departments.

This edition explores what is changing across AI agents, tools, and developer workflows. Google introduced Gemma 4 for on-device agents, GitHub is opening Copilot for Eclipse, and Deep Agents released major updates worth exploring. Builder 2.0 is rethinking collaborative AI development, while new ideas around multi-agent systems show where AI building is heading next.

You will also explore AI agents built for software testing, new approaches to Salesforce testing with AI, benchmarks comparing leading LLMs, the growing need for AI agent observability, and new thinking around the tools used to build agentic systems.

📬 Found something useful or interesting? Hit reply and let's share perspectives.

News

Bring State-of-the-Art Agentic Skills to the Edge With Gemma 4

06 mindevelopers.googleblog.com

🤖 What if your phone could run a full AI agent without touching the cloud? The Google AI Edge Team announces Gemma 4, a family of open-weight models built for on-device agentic AI, supporting multi-step planning, native function calling, and 140+ languages without cloud dependency. Released under Apache 2.0, edge variants process 4,000 tokens in under 3 seconds, with Agent Skills fully on-device agentic workflows on mobile for the first time.

Gemma 4 Available on Google Cloud

09 mincloud.google.com

☁️ What happens when powerful AI meets enterprise-grade cloud infrastructure? Richard Seroter, Chief Evangelist at Google Cloud, announces Gemma 4 is now available on Google Cloud, bringing reasoning, function calling, and 256K context windows to Vertex AI, Cloud Run, and GKE. Built on Apache 2.0, it supports sovereign cloud deployments and pairs with the Agent Development Kit to help enterprises build and deploy AI agents securely.

GitHub Copilot for Eclipse Is Going Open Source

11 mindevblogs.microsoft.com

🔓 Could open source be the key to making AI coding tools work for everyone? Jialuo Gan, Program Manager at Microsoft, announces that the GitHub Copilot for Eclipse plugin is going open source under the MIT license, hosted under the Microsoft organization on GitHub. The move invites the community to contribute, extend, and innovate, with plans to continue evolving Copilot features in Eclipse.

GitHub Copilot CLI Combines Model Families for a Second Opinion

12 mingithub.blog

🦆 Ever wish your AI coding agent could get a second opinion before making a mistake? Nick McKenna and Bartek Perz, Applied Researchers at GitHub, introduce Rubber Duck, an experimental feature in GitHub Copilot CLI that uses a second model from a different AI family to review the primary agent's plans and code. Pairing Claude Sonnet 4.6 with Rubber Duck running GPT-5.4 closes 74.7% of the performance gap between Sonnet and Opus, catching architectural flaws, silent bugs, and cross-file conflicts before they compound.

Deep Agents v0.5

06 minblog.langchain.com

🔗 Is blocking execution still the biggest bottleneck in multi-agent systems? The LangChain Team releases Deep Agents v0.5, introducing async non-blocking subagents that run independently in the background via the Agent Protocol, eliminating the bottleneck of blocking execution for long-running tasks. The update also expands multimodal filesystem support to PDFs, audio, and video, with file type detected automatically and content passed natively to the underlying model.

Announcing Builder 2.0: Collaborative Coding with Claude and Codex

11 minbuilder.io

🚀 Builder 2.0 is here! Steve Sewell from Builder.io announces the first collaborative AI development platform where engineers, designers, PMs, and QA all build together on real code in real time. Teams can trigger tasks from Slack or Jira, run hundreds of parallel agents, and ship through existing CI/CD pipelines, turning individual AI productivity into team-wide velocity.

AI Agents by TestMu

07 mintestmuai.com

Exploring AI agents for modern software quality? TestMu AI’s agentic ecosystem is built to support both QA and development teams. It brings together purpose-built agents that maintain different layers of quality, from test creation and execution to self-healing and validation, helping maintain consistency across the entire Software development lifecycle.

AI Was Supposed to Save You Time. So, Why Are You Busier Than Ever?

12 minblog.agent.ai

⏳ Why does everyone feel busier after adopting AI? Whitney Hathcock from Agent.ai shares that efficiency gains get absorbed by rising expectations, while prompting, reviewing, and correcting outputs adds new overhead. She reframes AI's true value as a capability unlock, helping individuals and small businesses do work that was previously out of reach.

What Is AI Agent Observability?

08 minkanerika.com

🔍 Your AI agent passed every health check, and still got it wrong. Mangal Dwivedi shares that traditional monitoring fails AI agents because clean API responses can hide hallucinated outputs. He breaks down a four-layer observability stack covering tracing, logging, metrics, and evaluation, with Gartner predicting LLM observability will reach 50% of GenAI deployments by 2028.

How and When To Use Subagents in Claude Code

08 minclaude.com

🤖 What if your coding agent could research, build, and review all at once? The Anthropic Team shares a practical guide on using subagents in Claude Code, isolated instances with their own context windows that handle parallel edits, research, and independent reviews without cluttering the main session. They cover four invocation methods, from conversational prompts to automated hooks, helping teams delegate deliberately.

A Multi-Agent Approach to Audience Intelligence

06 mindatabricks.com

🎯 What if advertisers could find audiences they didn't know existed? Bradley Munday and Tyler Hickey from Databricks share how a multi-agent system using Genie, an Affinity Agent, and Agent Bricks helps advertisers build audience segments in natural language, then automatically surfaces statistically validated behavioral patterns, collapsing traditional days-long planning cycles into a single conversation.

Automation

Rethinking Your Test Strategy for AI-Powered Features

10 minthegreenreport.blog

🧪 Still writing deterministic tests for non-deterministic AI features? Irfan Mujagic shares a practical framework for QA automation engineers, covering when to automate, when to use eval tools like PromptFoo and DeepEval with quality thresholds, when to test manually, and when to simply monitor production. The key shift is moving from binary assertions to graduated evaluations that define what "good enough" actually means.

Salesforce Testing with AI: Types, Tools & Automation 2026

08 mintestmuai.com

☁️ Testing Salesforce just got a whole lot smarter! Salman Khan from TestMu AI shares how you can now perform Salesforce testing in a more advanced way with KaneAI, letting you plan, author, and evolve test cases in plain English with no coding required, while covering key concepts like Apex, custom objects, data integrity, and best practices for handling rapid seasonal releases.

Automation First App Design Framework & Best Practices

10 mindigital.ai

⚙️ Building an app developers love to test? Pradeep Kumar, Principal Technical Solutions Engineer at Digital.ai, shares a practical framework for designing automation-ready applications from day one, covering stable element identifiers with data-testid attributes, semantic HTML, observable loading states, predictable API responses, and test-friendly configuration modes that make automation faster and more reliable.

Tools

Agentic LLM Benchmark: Top 13 LLMs Compared

08 minaimultiple.com

🏆 Which LLM actually performs best as an autonomous coding agent? Nazlı Şipi and Cem Dilmegani from AIMultiple share a benchmark of 13 LLMs across 10 real-world software development tasks, executing 300 automated validation steps per model. Claude Sonnet 4.5 and GPT-5.2 topped the rankings, with Claude achieving the highest UI success rate and most consistent results across both API logic and frontend integration.

16 Best Generative AI Tools in 2026: Ranked by Use Case

07 mintestmuai.com

🛠️ Looking for the right generative AI tool but not sure where to begin? Saniya Gazala from TestMu AI carefully curates a ranked breakdown of 16 top generative AI tools by use case, from ChatGPT, Claude, and Gemini for text and code, to Midjourney and DALL-E 3 for visuals, covering key architectures like LLMs, diffusion models, and multimodal models to help you pick the right fit.

We Need to Re-Learn What AI Agent Development Tools Are in 2026

10 minblog.n8n.io

🔄 Everything you knew about AI agent development tools needs an update. Andrew Green from n8n shares how capabilities like RAG, memory, and web search have become commoditized across most LLM services, making the old evaluation framework obsolete. He proposes a new framework focused on enterprise-readiness, coding flexibility, and purpose-built agentic logic rather than basic building blocks.

Video & Podcast

Anthropic's Mythos AI Is Too Dangerous to Release. They're Using It Anyway.

08 minaiforhumans.show

🎧 In this episode of AI for Humans, Kevin Pereira and Gavin Purcell break down Anthropic's Claude Mythos, a powerful new model accidentally leaked in March 2026 that found thousands of critical vulnerabilities in every major OS and browser. Rather than a public release, Anthropic launched Project Glasswing, sharing Mythos with Apple, Google, Microsoft, and 40+ partners for defensive cybersecurity purposes only.

Learn 80% of Claude Cowork in Under 20 Minutes

07 minyoutube.com

🎥 In this video, the AI For Humans channel walks through the 7 core capabilities of Claude Cowork, Anthropic's desktop app that turns Claude from a chatbot into a full local productivity system. From reading hundreds of local files and building persistent memory, to connecting Gmail, Notion, and Google Drive, creating reusable skills, and scheduling automated tasks like daily inbox triage, it is the practical starting point for anyone ready to go beyond Chat.

Events

TestingUy 2026

11 mintestinguy.org

📅 TestingUy 2026, LATAM's most important software testing and quality event, returns on April 15 and 16 in Montevideo, Uruguay. Day 1 is fully online, and Day 2 brings in-person talks, workshops, open spaces, and networking at the Telecommunications Tower, with a focus on AI in testing, back-to-basics approaches, and leadership. Free to attend with prior registration.

Issue 289

OpenAI Raises $122 Billion to Accelerate the Next Phase of AI
How Atypon Uses TestMu AI to Scale Test Automation
Live Test Case Authoring with BearQ AI Agents

Issue 291

AWS Weekly Roundup: Claude Mythos Preview in Amazon Bedrock, AWS Agent Registry, and more
RAG System Architecture: Components, How To Implement, Challenges, and Best Practices
Top 11 AI Agent Frameworks (2026): Expert-Tested & Reviewed