Learn how to perform vibe testing with Playwright MCP and Claude to validate user experience. Run AI-driven browser tests, generate scripts, and use Kane AI.

Faisal Khatri
April 15, 2026
With the rapid adoption of AI coding agents like Claude and GitHub Copilot, modern AI test automation is no longer just about writing scripts. It is about working alongside AI to generate, maintain, and optimize tests faster, with quick feedback loops, smart test generation, and self-healing capabilities that keep AI-generated scripts stable and reliable.
As vibe coding reshapes how teams build software using AI, vibe testing with Playwright has emerged as its natural counterpart. With the rise of MCP and AI agents, AI testing tools can now connect directly with AI systems, allowing tests to be generated, analyzed, and refined through natural language interactions.
This approach helps teams validate not just what was built, but how it feels to use, including visual consistency, layout alignment, accessibility, performance feel, and even small UI details that impact user perception.
Overview
What Does a Visual Testing AI Agent Do?
Playwright MCP enables MCP AI agents to control browsers via Playwright, executing actions and returning structured results using snapshot or vision modes.
How Can You Perform Vibe Testing with Playwright MCP with Claude?
Vibe testing with Playwright MCP allows Claude to simulate real user journeys and evaluate interaction flow, UI behavior, and feedback. Instead of writing scripts, testers describe scenarios in natural language and let the AI execute and validate the experience.
How to Run Vibe Tests with Kane AI?
KaneAI by TestMu AI uses natural language prompts to generate browser workflows, simplifying AI test automation for end-to-end testing.
Vibe testing evaluates how an app feels to users, not just if it works. With Claude and Playwright MCP, you describe a user journey and AI executes and validates it in a real browser.
Vibe testing with Playwright focuses on evaluating the overall user experience, not just whether the app works correctly. Instead of just checking whether the app works as expected, vibe testing looks at the overall experience from a user's point of view, and it asks questions like:
When paired with an AI agent like Claude, vibe testing becomes conversational. You describe a user journey in plain English, the AI executes it in a real browser, validates the outcomes, and reports back. No scripting required.
Note: Create, author, and evolve tests using NLP with KaneAI. Book a Demo!
Playwright MCP is a Model Context Protocol server that lets AI agents control a browser using Playwright, Microsoft's open-source automation framework.
The AI sends high-level instructions to the MCP server, which performs browser actions like opening pages, clicking elements, and filling forms, then returns structured results including pass/fail status, text output, or screenshots.
It runs in two modes:
| Mode | How it works | When to use it |
|---|---|---|
| Snapshot Mode (Default) | Reads the browser's accessibility tree, a structured map of page elements by role, name, and value | Most tasks, such as navigation, forms, and content extraction |
| Vision Mode | Uses screenshots and coordinate-based interactions | Edge cases where the accessibility tree is not enough |
Snapshot Mode covers the vast majority of vibe testing scenarios, whereas the Vision Mode is a fallback.
Playwright MCP integrates with tools like Claude Desktop, Cursor IDE, and GitHub Copilot in Visual Studio Code, allowing AI assistants to interact with web applications in a structured, reliable way.
What Claude can do once connected:
The following step-by-step guide will allow us to install Playwright MCP with Claude:
Make sure Node.js and npm are installed on your system. If you haven't set up Playwright yet, check out our Playwright Install guide before proceeding.
node -v
npm -v{
"mcpServers": {
"playwright": {
"command": "npx",
"args": [
"-y",
"@playwright/mcp@latest"
]
}
}
}This tells Claude Desktop how to start the Playwright MCP server. It uses npx and runs @playwright/mcp@latest automatically with the -y flag to skip confirmation prompts.

Open Claude Desktop and enter this prompt:
Claude will send the instructions to the Playwright MCP server, which runs them in a real browser and returns the output, logs, and screenshots. Once complete, Claude surfaces a summary of every step performed.
Test Execution:
After the command is executed, Claude returns a clear summary of all the steps performed, along with relevant details. This output can then be used for deeper analysis, validation, or reporting.
Vibe testing with Playwright MCP is an AI-driven testing approach that evaluates more than just functional correctness, focusing on the overall user experience and feel of the application.
Let us use the Claude desktop and Playwright MCP to perform Vibe Testing of the TestMu AI E-Commerce demo playground.
The prompt below describes a complete user journey in plain English. Copy it, paste it into a new Claude chat, and run it.
Copy the following prompt, paste it into Claude > New Chat, and run it.
Test Scenario: Search and Add to Cart Flow
Goal: Verify that a user can successfully search for a product and add it to their shopping cart.
Preconditions: User is on the homepage of TestMu AI Playground (https://ecommerce-playground.lambdatest.io/)
Steps:
1. Navigate to the TestMu AI Playground page.
2. Enter the keyword "MacBook Air" into the search bar and submit the search.
3. Validate that the search results display products matching the search term "MacBook Air".
4. From the search results, hover over the first product with the name "MacBook Air",
and click on the "Cart" icon to add it to the cart, then click on the "View Cart" button
as soon as the pop-up is displayed.
5. Confirm that the "Cart icon" count is "1", verify that the message
"Products marked with *** are not available in the desired quantity or not in stock!"
is displayed on the "Shopping Cart" page.
6. Take a screenshot of the page.
Expected Results:
- Search results relevant to the keyword are shown
- The cart reflects the correct item count of 1
- The stock warning message is displayed on the Shopping Cart pageClaude will integrate with the Playwright MCP, perform browser actions such as navigating to websites, locating elements, and performing clicks, hover, key presses, and taking screenshots.
Test Execution
When it receives the Vibe Testing prompt, Claude intelligently executes each step sequentially, running the browser in headed mode so the actions performed are visible.
Claude automatically checks for the web elements such as buttons, links, and text. It decides the best locator strategy and interaction method.
For example, to perform a click after hovering over the product, it uses JavaScript code to interact with the web elements.
After executing all the steps, it provides a consolidated report with all the steps and pass/fail status.
As we provided it with the Expected Results summary, it analyzes the outcomes, compares them against those expectations, and provides us with a clear validation report.
Claude can generate Playwright test code in any preferred language. Describe the scenario in the prompt and it produces the corresponding automation script.
Use the following prompt in Claude to generate the test script:
Test Scenario:
Claude-generated test script:
import { test, expect } from '@playwright/test';
test.describe('LambdaTest E-Commerce Login - Invalid Credentials', () => {
test('should display warning message for invalid email/password combination', async ({ page }) => {
// Step 1: Navigate to LambdaTest E-Commerce Playground
await page.goto('https://ecommerce-playground.lambdatest.io/');
// Step 2: Navigate to the Login Page
// Hover over "My Account" to reveal the dropdown, then click "Login"
await page.locator('a:has-text("My account")').hover();
await page.locator('a[href*="route=account/login"]').click();
// Verify we're on the login page
await expect(page).toHaveURL(/route=account/login/);
await expect(page.locator('h1, h2').filter({ hasText: /Returning Customer/i })).toBeVisible();
// Step 3: Enter invalid email address
await page.locator('#input-email').fill('[email protected]');
// Step 4: Enter password
await page.locator('#input-password').fill('Password@123');
// Step 5: Click the Login button
await page.locator('input[value="Login"], button:has-text("Login")').first().click();
// Step 6: Verify the warning message is displayed
const warningMessage = page.locator('.alert-danger, div[class*="alert"]');
await expect(warningMessage).toBeVisible({ timeout: 10000 });
await expect(warningMessage).toContainText(
'Warning: No match for E-Mail Address and/or Password.'
);
});
});
Along with the script, Claude provides a breakdown of the test and the command to run it.
Before adding any AI-generated script to your test suite, review it against these checkpoints:
| Checkpoint | What to look for |
|---|---|
| Clean Code | Well-structured, readable code with unnecessary comments removed |
| Locators | Stable selectors that follow Playwright best practices |
| Assertions | Clearly defined assertions tied to expected business outcomes |
| Syntax | Correct async/await usage with valid TypeScript signatures |
| Error Handling | Edge cases covered for complex scenarios |
| Maintainability | Steps are logically organized and easy to update |
Overall, the Claude-generated script serves as a strong starting point but should be reviewed and refined before using it in the project test suite.
When reviewing AI-generated Playwright code, you should also:
Let's run the Claude-generated code on the local machine to verify if it works as expected.
Step 1: Save the generated script in the tests folder as lambdatest-ecommerce-login.spec.ts
Step 2: Run the following command in the terminal:
npx playwright test lambdatest-ecommerce-login.spec.tsOn executing the above command, the following error is generated in the console, and the test fails:
1) [Google Chrome] > specs/lambdatest-ecommerce-login.spec.ts:4:7 > LambdaTest E-Commerce Login - Invalid Credentials > should display warning message for invalid email/password combination
Error: locator.hover: Error: strict mode violation: locator('a:has-text("My account")') resolved to 2 elements:
1) <a class="icon-left both nav-link" href="https://ecommerce-playground.lambdatest.io/index.php?route=account/account">...</a> aka getByRole('link', { name: ' My account' })
2) <a role="button" aria-haspopup="true" aria-expanded="false" data-toggle="dropdown" class="icon-left both nav-link dropdown-toggle" href="https://ecommerce-playground.lambdatest.io/index.php?route=account/account">...</a> aka getByRole('button', { name: ' My account' })
Call log:
- waiting for locator('a:has-text("My account")')
8 | // Step 2: Navigate to the Login Page
9 | // Hover over "My Account" to reveal the dropdown, then click "Login"
> 10 | await page.locator('a:has-text("My account")').hover();
| ^
11 | await page.locator('a[href*="route=account/login"]').click();
12 |
13 | // Verify we're on the login page
at /Users/faisalkhatri/Github/playwright-ts-demo/tests/specs/playwright-claude-code.spec.ts:10:52
The test failed because the locator a:has-text("My account") matched two elements, causing a strict mode violation since Playwright expected only one unique element to interact with.
Let's fix the test by updating the locator as suggested by Playwright:
await page.getByRole('button', { name: 'My account' }).hover();On re-running the test after updating the locator for the My account button, the test fails again, giving the following error:
Error: expect.toBeVisible: Error: strict mode violation: locator('.alert-danger, div[class*="alert"]') resolved to 3 elements:
1) <div data-id="217845" id="entry_217845" class="entry-design design-alert flex-grow-0 flex-shrink-0">...</div> aka locator('#entry_217845')
2) <div class="alert alert-info ">...</div> aka locator('#entry_217845 div')
3) <div class="alert alert-danger alert-dismissible">...</div> aka getByText('Warning: No match for E-Mail')
Call log:
- expect.toBeVisible with timeout 10000ms
- waiting for locator('.alert-danger, div[class*="alert"]')Let's update the Playwright locators for the warning message text to the following, as suggested in the error log:
const warningMessage = page.locator('.alert-danger');Below is the final refactored test script with corrected locator strategies for the My account button and alert message text. Final passing script:
import { test, expect } from '@playwright/test';
test.describe('LambdaTest E-Commerce Login - Invalid Credentials', () => {
test('should display warning message for invalid email/password combination', async ({ page }) => {
// Step 1: Navigate to LambdaTest E-Commerce Playground
await page.goto('https://ecommerce-playground.lambdatest.io/');
// Step 2: Navigate to the Login Page
// Hover over "My Account" to reveal the dropdown, then click "Login"
await page.getByRole('button', { name: 'My account' }).hover();
await page.locator('a[href*="route=account/login"]').click();
// Verify we're on the login page
await expect(page).toHaveURL(/route=account/login/);
await expect(page.locator('h1, h2').filter({ hasText: /Returning Customer/i })).toBeVisible();
// Step 3: Enter invalid email address
await page.locator('#input-email').fill('[email protected]');
// Step 4: Enter password
await page.locator('#input-password').fill('Password@123');
// Step 5: Click the Login button
await page.locator('input[value="Login"], button:has-text("Login")').first().click();
// Step 6: Verify the warning message is displayed
const warningMessage = page.locator('.alert-danger');
await expect(warningMessage).toBeVisible({ timeout: 10000 });
await expect(warningMessage).toContainText(
'Warning: No match for E-Mail Address and/or Password.'
);
});
});On re-running the tests, it passes as shown in the screenshot below:
Note: The initial Claude-generated code is intentionally kept as-is to illustrate what needs review. The refactoring walkthrough above is based on that original output.
What still needs attention in this script before production use:
Refactoring checklist:
If you are using Cursor AI as your primary IDE, our guide on how to use Playwright skills with Cursor AI covers how to set up Playwright skills and accelerate refactoring workflows directly inside the editor.
Playwright Codegen records real browser interactions and generates scripts automatically. AI generates tests from natural language prompts. Neither replaces the other; they serve different purposes.
Using AI does not mean completely replacing the traditional Playwright codegen command and handing over everything to your favorite LLM. Both approaches have their own strengths and limitations.
| Criteria | Playwright Codegen | Playwright with AI |
|---|---|---|
| How it works | Records user interactions and generates scripts based on browser actions. It can also automatically generate toBeVisible() assertions. | Generates and executes tests based on natural language prompts. |
| Input Method | Manual interaction with the browser (click, type, navigate). | Instructions are provided in plain English text describing the test scenario. |
| Understanding Context | Does not understand business logic, only records actions. | It can understand context, business rules, and expected outcomes from prompts. |
| Speed of Test Script creation | The automation scripts are automatically generated after the user interacts with the website. | As soon as the user inputs the prompt, the test scripts and test execution begin. |
| Exploratory Testing | Not suitable for exploratory testing as it records actions performed by the user. Codegen does not automatically add validations and needs manual enhancements. | Supports vibe testing, exploratory scenarios, and UX-focused validation. |
| Dependency on Human Actions | Requires manual interaction during recording. | Fully prompt-driven. No manual browser interaction is required. |
| Flexibility | Limited to recorded steps. Requires manual refactoring for improvements. | Can generate optimized and reusable code directly. |
| Maintenance | Scripts often require cleanup and restructuring after recording. | Can regenerate or refactor scripts quickly based on updated prompts. |
| Use Case | Best for quick prototyping or basic UI automation. | Ideal for intelligent automation, test generation, and UX validation. |
| Learning Curve | Easy for beginners to start recording the tests. | Requires skill in writing clear and concise prompts. |
The choice between Codegen and AI ultimately depends on your workflow. If you're looking to go deeper on the AI side, this guide on Playwright with AI covers how teams are building smarter, more adaptive automation pipelines.
For fully autonomous test execution, Playwright Test Agents extend this approach by combining AI reasoning with browser automation to eliminate manual scripting entirely.
So far, you have used Claude Desktop with Playwright MCP to run vibe tests. In practice, this works well until the application starts changing frequently. Locators break, scripts need constant updates, and maintaining a stable local setup across the team becomes a time sink.
Auto-healing is often the first thing teams look for when test maintenance starts eating into sprint time.
You can use generative AI tools, like KaneAI by TestMu AI(formerly LambdaTest), a full-stack Agentic AI Quality Engineering platform.
Kane AI addresses these challenges directly with a purpose-built testing agent that handles element detection, auto-healing, test evolution, and reporting without manual intervention.
Kane AI by TestMu AI is a GenAI-native testing agent. Run vibe tests from natural language prompts with auto-healing, live execution, and detailed reporting without scripting.
Kane AI by TestMu AI is a GenAI-native testing agent that allows teams to plan, author, and evolve tests using natural language. Built for high-speed quality engineering teams, it integrates seamlessly with test planning, execution, orchestration, and analysis workflows.
Unlike traditional functional testing, Kane AI evaluates the overall feel of the application, including flow, responsiveness, and visual feedback, making it particularly effective for exploratory testing and catching usability gaps that scripted tests often miss.
KaneAI powers Vibe Testing with intelligent, AI-driven capabilities that enhance both test creation and execution:
To get started with KaneAI, let's take the same test scenario we ran on Claude:
To start Vibe Testing with Kane AI:
To get started with Vibe Testing with KaneAI, follow this support documentation on why we need KaneAI.
Vibe testing is inherently exploratory, so failures are expected while refining prompts and improving test coverage.
If you encounter failures during the refactoring cycle, Playwright's built-in Copy Prompt feature lets you copy the error details directly from the test report and paste them into AI assistants such as ChatGPT or Claude to quickly diagnose and resolve the issue, without manually writing a debugging prompt.
For scenarios where tests are authored and executed using Kane AI, you can refer to the official support documentation on Error Handling in Authoring with Kane AI. This explains how Kane AI identifies failures during prompt-based test creation and provides strategies for refining prompts, stabilizing tests, and resolving execution issues.
Beyond built-in debugging tools, TestMu AI also provides the TestMu AI MCP Server, which exposes a set of MCP tools that help AI agents analyze test results, debug failures, and interact with the testing infrastructure programmatically.
Vibe testing with Playwright MCP and Claude enables testers to validate not just functionality, but the overall user experience of an application. By providing clear and concise prompts, Claude can intelligently execute end-to-end flows in the browser, analyze outcomes, and summarize results. This approach makes exploratory and experience-focused testing faster, more accessible, and less script-dependent.
At the same time, KaneAI can be leveraged as a plug-and-play solution for vibe testing within modern QA workflows. It allows teams to generate, execute, and analyze tests using natural language, reducing the complexity of traditional automation setup. With built-in reporting and integration capabilities, it fits seamlessly into existing pipelines while enhancing experience-driven validation.
Did you find this page helpful?
More Related Hubs
TestMu AI forEnterprise
Get access to solutions built on Enterprise
grade security, privacy, & compliance