Next-Gen App & Browser Testing Cloud
Trusted by 2 Mn+ QAs & Devs to accelerate their release cycles

Want to find broken images on your website? Here's how you can do broken image testing on your website using Selenium in Java, Python, C#, and PHP.

Himanshu Sheth
December 24, 2025
A web product’s user experience is one of the key elements that help in user acquisition and user retention. Though immense focus should be given to the design & development of new product features, a continuous watch should be kept on the overall user experience. Like 404 pages (or dead links), broken images on a website (or web app) could also irk the end-users. Manual inspection and removal of broken images is not a feasible and scalable approach. Instead of using third-party tools to inspect broken images, you should leverage Selenium automation testing and see how to find broken images using Selenium WebDriver on your website.
In this part of the Selenium Tutorial, we look at how to find broken images on websites using Selenium WebDriver. From an end-user’s perspective, even a single broken image on a page could be an experience dampener – a prime reason to find broken images on websites. You can check out what WebDriver is, its features, how it works, best practices, and more in this WebDriver tutorial. If you are preparing for an interview you can learn more through Selenium interview questions.
By the end of this blog, you would be able to find broken images using Selenium WebDriver with Python, Java, C#, and PHP.
Starting your journey with Selenium WebDriver? Check out this step-by-step guide to perform Automation testing using Selenium WebDriver.
Broken image is a link/image that does not show up as a picture, clicking upon which takes the end-user to a defunct picture. The user encounters a 404 Error when clicked on the broken image. This error means that there is an issue with the image URL, and the image is not loaded properly (due to various reasons).
Shown below is an example of broken images on a website:

From an end-user experience and retention point of view, fixing broken images should be considered equally important as fixing broken links on websites. Selenium WebDriver can be used to find broken images on websites. The internal logic for locating broken images might vary based on how the images are fetched from the server.
Here are two ways in which images are read from the server:
Shown below is an example of usage of an absolute path in the ‘src’ attribute of the <img> tag:

The image shown above is fetched from an absolute location (i.e., the HostName is used in the <src> attribute):

For example, in <img src=”assets/img/image.jpg” alt=”some text”>; the path of image.jpg is relative to the root. If the website URL is https://www.someexample.com, the relative path of the image (image.jpg) will equate to https://www.someexample.com/assets/img/image.jpg
Here is a sample usage of relative path in the <src> attribute of the <img> tag:

You would be curious to know what leads to broken images on a website. Let’s look at the ‘why part’ of broken images?
Here are some of the prominent reasons that lead to broken images (i.e., file not found or 404 error for images) on a website (or web apps):
Like broken links, attention should be given to ensure that your web product is free from broken images.
Here are the two major reasons for checking for broken images on websites:
When a user visits a website, the user request is sent to the website’s server, which processes the request. In response to the browser’s request, the server sends a three-digit code referred to as the HTTP Status Code to the browser.
Some of the commonly used classes of HTTP Status Codes are 1xx, 2xx, 3xx, 4xx, and 5xx. To find broken images using the Selenium WebDriver, we would be using the 4xx class of status code, indicating that the particular page or the complete website is not reachable. The status code of class 2xx (particularly 200) suggests that the request sent by the web browser was successful, and the appropriate response was sent to the browser.
When an image is not available on the server, a response code 404 (Page Not Found) is sent to the web browser. You can refer to our earlier blog for detailed information on HTTP Status Codes and Status Codes presented on the detection of broken links/images.
Irrespective of the programming language being used to detect broken images, the basic principles remain the same. Here are some of the steps that can be followed to find broken images on websites:
The naturalWidth attribute returns the original width of the image, and it is zero for a broken image. For Selenium with Java, you could also check if the naturalWidth attribute of the image is zero or not.
In this Selenium Tutorial, we demonstrate how to find broken images using Selenium WebDriver in Java, Python, C#, and PHP. The tests are run on the latest version of the Chrome Browser on the Windows 10 platform. The execution is carried out on the cloud-based Selenium Grid provided by TestMu AI.
To get started with TestMu AI, you should create an account on the website and note the user-name & access-key from the profile section on TestMu AI. The browser capabilities are generated using TestMu AI Capabilities Generator.
Here is the test scenario to find broken images on the website:
Test Scenario
The URL under test https://the-internet.herokuapp.com/broken_images has two broken images and two proper images.
Shown below are the two broken images on the website:

Here are the two proper (or not broken) images on the website:

FileName – pom.xml
Implementation
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.brokenimages</groupId>
<artifactId>BrokenImages</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.testng</groupId>
<artifactId>testng</artifactId>
<version>6.9.10</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-nop</artifactId>
<version>1.7.28</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>4.0.0-alpha-7</version>
</dependency>
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-remote-driver</artifactId>
<version>4.0.0-alpha-7</version>
</dependency>
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-chrome-driver</artifactId>
<version>4.0.0-alpha-7</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.13</version>
</dependency>
</dependencies>
<build>
<defaultGoal>install</defaultGoal>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.0</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>11</source>
<target>11</target>
</configuration>
</plugin>
</plugins>
</build>
</project>
FileName – testng.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE suite SYSTEM "http://testng.org/testng-1.0.dtd">
<suite name="Broken Images Detection Demo">
<test verbose="2" preserve-order="true" name="Detect broken images in Selenium WebDriver">
<classes>
<class name="brokenimages.test_brokenimages">
</class>
</classes>
</test>
</suite>
FileName – test_brokenimages.java
package brokenimages;
import java.io.IOException;
import java.net.MalformedURLException;
import java.util.List;
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.HttpClientBuilder;
import org.openqa.selenium.By;
import java.net.URL;
import org.openqa.selenium.JavascriptExecutor;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.remote.DesiredCapabilities;
import org.openqa.selenium.remote.RemoteWebDriver;
import org.testng.annotations.AfterClass;
import org.testng.annotations.BeforeClass;
import org.testng.annotations.Test;
public class test_brokenimages
{
/* protected static ChromeDriver driver; */
WebDriver driver = null;
String URL = "https://the-internet.herokuapp.com/broken_images";
public static String status = "passed";
String username = "user-name";
String access_key = "access-key";
@BeforeClass
public void testSetUp() throws MalformedURLException {
DesiredCapabilities capabilities = new DesiredCapabilities();
capabilities.setCapability("build", "[Java] Finding broken images on a webpage using Selenium");
capabilities.setCapability("name", "[Java] Finding broken images on a webpage using Selenium");
capabilities.setCapability("platform", "Windows 10");
capabilities.setCapability("browserName", "Chrome");
capabilities.setCapability("version","latest");
capabilities.setCapability("tunnel",false);
capabilities.setCapability("network",true);
capabilities.setCapability("console",true);
capabilities.setCapability("visual",true);
driver = new RemoteWebDriver(new URL("http://" + username + ":" + access_key + "@hub.lambdatest.com/wd/hub"),
capabilities);
System.out.println("Started session");
}
@Test(description="Approach 1: Find broken images on a web page using Selenium WebDriver", enabled=true)
protected void test_selenium_broken_images_appr_1()
{
Integer iBrokenImageCount = 0;
driver.navigate().to(URL);
driver.manage().window().maximize();
try
{
iBrokenImageCount = 0;
List<WebElement> image_list = driver.findElements(By.tagName("img"));
/* Print the total number of images on the page */
System.out.println("The page under test has " + image_list.size() + " images");
for (WebElement img : image_list)
{
if (img != null)
{
HttpClient client = HttpClientBuilder.create().build();
HttpGet request = new HttpGet(img.getAttribute("src"));
HttpResponse response = client.execute(request);
/* For valid images, the HttpStatus will be 200 */
if (response.getStatusLine().getStatusCode() != 200)
{
System.out.println(img.getAttribute("outerHTML") + " is broken.");
iBrokenImageCount++;
}
}
}
}
catch (Exception e)
{
e.printStackTrace();
status = "failed";
System.out.println(e.getMessage());
}
status = "passed";
System.out.println("The page " + URL + " has " + iBrokenImageCount + " broken images");
}
@Test(description="Approach 2: Find broken images on a web page using Selenium WebDriver", enabled = true)
protected void test_selenium_broken_images_appr_2()
{
Integer iBrokenImageCount = 0;
driver.navigate().to(URL);
driver.manage().window().maximize();
try
{
iBrokenImageCount = 0;
List<WebElement> image_list = driver.findElements(By.tagName("img"));
/* Print the total number of images on the page */
System.out.println("The page under test has " + image_list.size() + " images");
for (WebElement img : image_list)
{
if (img != null)
{
if (img.getAttribute("naturalWidth").equals("0"))
{
System.out.println(img.getAttribute("outerHTML") + " is broken.");
iBrokenImageCount++;
}
}
}
}
catch (Exception e)
{
e.printStackTrace();
status = "failed";
System.out.println(e.getMessage());
}
status = "passed";
System.out.println("The page " + URL + " has " + iBrokenImageCount + " broken images");
}
@AfterClass
public void tearDown()
{
if (driver != null) {
((JavascriptExecutor) driver).executeScript("lambda-status=" + status);
driver.quit();
}
}
}
Code Walkthrough [Approach – 1]
1. Import the required packages
The Apache HttpClient library is used for handling the HTTP requests. To use the latest version of HttpClient library, the dependency for the library is added to the Maven Build file (pom.xml).
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.13</version>
</dependency>
To find the broken images on the page under test, the HttpClient library is used for checking the status codes of the images present on the page. The necessary packages are imported so that its methods can be used in the implementation.
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.HttpClientBuilder;
2. Find all the images on the page
The findElements method in Selenium is used for fetching the details of all the images present on the page. The images are located using the tagName ‘ img.’
The images are placed in a list, which will be further iterated to find broken images on the page.
List<WebElement> image_list = driver.findElements(By.tagName("img"));
System.out.println("The page under test has " + image_list.size() + " images");
3. Create a new instance of HttpClient
The HttpClient class offers an API that primarily consists of three core classes – HttpClient, HttpRequest, and HttpResponse. HttpResponse describes the result of an HttpRequest call. For reading the response body, we create a new instance of HttpClient and request the objects. The new instance of the class is created with the build() method of HttpClientBuilder class.
HttpClient client = HttpClientBuilder.create().build();
4. Create a new instance of HttpGet
CloseableHttpClient provides the execute method for sending and receiving the data. The execute method uses the parameter of type HttpUriRequest, which has many sub-classes, including HttpGet and HttpPost.
We first create a new HttpGet object with the HttpUriRequest set to path retrieved by reading the src attribute in the WebElement img.
HttpGet request = new HttpGet(img.getAttribute("src"));
For example – getAttribute(“src”) for the image “Fork me on GitHub” will return /img/forkme_right_green_007200.png.

5. Retrieve the response object
The execute method executes the HTTP request using the default context. It returns the response body (i.e. HttpResponse).
HttpResponse response = client.execute(request);
6. Read the Status Code
The getStatusLine method of the HttpResponse class obtains the status line of the response [obtained from step(5)]. The getStatusCode method returns the HttpStatus in an integer format. Response Code 200 (SC_OK) means that the HTTP request was executed successfully.
if (response.getStatusLine().getStatusCode() != 200)
{
System.out.println(img.getAttribute("outerHTML") + " is broken.");
iBrokenImageCount++;
}
If HttpStatus is 200, the concerned image is not broken, whereas HttpStatus for a broken image is 404. Steps (3) thru’ (6) are repeated for all the WebElement entries in the image list. The outerHTML attribute for the broken images is printed for reference on the terminal.
Code Walkthrough [Approach – 2]
1. Find all the images on the page
Similar to Step(2) in Approach – 1, the findElements method in Selenium is used to fetch the details of images present on the image. The tagName img is used with the findElements method to achieve the same.
List< WebElement > image_list = driver.findElements(By.tagName("img"));
2. Read the naturalWidth attribute
The naturalWidth attribute of the WebElements identified in Step(1) is read. For broken images, naturalWidth will be zero whereas it is non-zero for normal images.
if (img.getAttribute("naturalWidth").equals("0"))
{
System.out.println(img.getAttribute("outerHTML") + " is broken.");
iBrokenImageCount++;
}
Step (2) is repeated for all the WebElements in the list image_list, which was obtained in Step (1). The variable iBrokenImageCount indicates the number of broken images on the page.
Execution
Shown below are the execution snapshots of Approach – 1 and Approach – 2. As expected, we see that there are two broken images on the webpage under test.



Implementation
FileName – test_brokenimages.py
import requests
import urllib3
import pytest
from requests.exceptions import MissingSchema, InvalidSchema, InvalidURL
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
capabilities = {
"build": "[Python] Finding broken images on a webpage using Selenium",
"name": "[Python] Finding broken images on a webpage using Selenium",
"platform": "Windows 10",
"browserName": "Chrome",
"version": "latest"
}
user_name = "user-name"
app_key = "access-key"
URL = "https://the-internet.herokuapp.com/broken_images"
iBrokenImageCount = 0
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
remote_url = "http://" + user_name + ":" + app_key + "@hub.lambdatest.com/wd/hub"
driver = webdriver.Remote(command_executor=remote_url, desired_capabilities=capabilities)
driver.maximize_window()
driver.get(URL)
image_list = driver.find_elements(By.TAG_NAME, "img")
print('Total number of images on '+ URL + ' are ' + str(len(image_list)))
for img in image_list:
try:
response = requests.get(img.get_attribute('src'), stream=True)
if (response.status_code != 200):
print(img.get_attribute('outerHTML') + " is broken.")
iBrokenImageCount = (iBrokenImageCount + 1)
except requests.exceptions.MissingSchema:
print("Encountered MissingSchema Exception")
except requests.exceptions.InvalidSchema:
print("Encountered InvalidSchema Exception")
except:
print("Encountered Some other Exception")
driver.quit()
print('The page ' + URL + ' has ' + str(iBrokenImageCount) + ' broken images')
Code Walkthrough
1. Import Modules
The requests module is imported so that we can send HTTP requests to the target URL. In case the requests module is not installed on the dev machine, run the command pip install requests to install the same.

import requests
import urllib3
from requests.exceptions import MissingSchema, InvalidSchema, InvalidURL
2. Fetch details about the images present on the page
WebElements with the ‘img’ tag are read using the find_elements method in Selenium.
image_list = driver.find_elements(By.TAG_NAME, "img")
3. Send an HTTP request
The get() method in the requests module sends a GET request to the URL passed to it. The src attribute in the img tag contains the location of the image on the server. It is passed to the requests.get() method. Stream in the get() method is set to true, so the response to the HTTP request is immediately downloaded.
response = requests.get(img.get_attribute('src'), stream=True)
In return, we get requests.Response() object that contains the server’s response to the HTTP request.
4. Read the Status Code from the Response object
The status_code property in requests.Response() object indicates the status of the HTTP request. HTTP Status Code of 200 means that the image is not broken whereas the image is broken if the Status Code is 404.
if (response.status_code != 200):
print(img.get_attribute('outerHTML') + " is broken.")
iBrokenImageCount = (iBrokenImageCount + 1)
Repeat steps (3) through (4) for all the WebElement entries in the list (i.e., image_list).
Execution
We run the file by triggering the command python <file_name.py> on the terminal. As shown below, two broken images were found on the page under test.

Implementation
FileName – BrokenImageTest.cs
using OpenQA.Selenium;
using OpenQA.Selenium.Remote;
using OpenQA.Selenium.Chrome;
using NUnit.Framework;
using System.Threading;
using System.Collections.Generic;
using System.Linq;
using System.Net;
using OpenQA.Selenium.Remote;
using System;
using System.Threading;
using System.Net.Http;
using System.Threading.Tasks;
namespace TestBrokenImages
{
[TestFixture("chrome", "latest", "Windows 10")]
public class TestBrokenImages
{
private String browser;
private String version;
private String os;
IWebDriver driver;
public TestBrokenImages(String browser, String version, String os)
{
this.browser = browser;
this.version = version;
this.os = os;
}
[SetUp]
public void Init()
{
String username = "user-name";
String accesskey = "access-key";
String gridURL = "@hub.lambdatest.com/wd/hub";
DesiredCapabilities capabilities = new DesiredCapabilities();
capabilities.SetCapability("user", username);
capabilities.SetCapability("accessKey", accesskey);
capabilities.SetCapability("browserName", browser);
capabilities.SetCapability("version", version);
capabilities.SetCapability("platform", os);
capabilities.SetCapability("build", "[C#] Finding broken images on a webpage using Selenium");
capabilities.SetCapability("name", "[C#] Finding broken images on a webpage using Selenium");
driver = new RemoteWebDriver(new Uri("https://" + username + ":" + accesskey + gridURL), capabilities, TimeSpan.FromSeconds(600));
System.Threading.Thread.Sleep(2000);
}
[Test]
public async Task LT_Broken_Images_Test()
{
int broken_images = 0;
String test_url = "https://the-internet.herokuapp.com/broken_images";
driver.Url = test_url;
using var client = new HttpClient();
var image_list = driver.FindElements(By.TagName("img"));
/* Loop through all the images */
foreach (var img in image_list)
{
try
{
/* Get the URI */
HttpResponseMessage response = await client.GetAsync(img.GetAttribute("src"));
/* Reference - https://docs.microsoft.com/en-us/dotnet/api/system.net.httpwebresponse.statuscode?view=netcore-3.1 */
if (response.StatusCode == HttpStatusCode.OK)
{
System.Console.WriteLine("Image at the link " + img.GetAttribute("src") + " is OK, status is "
+ response.StatusCode);
}
else
{
System.Console.WriteLine("Image at the link " + img.GetAttribute("src") + " is Broken, status is "
+ response.StatusCode);
broken_images++;
}
}
catch (Exception ex)
{
if ((ex is ArgumentNullException) ||
(ex is NotSupportedException))
{
System.Console.WriteLine("Exception occured
");
}
}
}
/* Perform wait to check the output */
System.Threading.Thread.Sleep(2000);
Console.WriteLine("
The page " + test_url + " has " + broken_images + " broken images");
}
[TearDown]
public void Cleanup()
{
if (driver != null)
driver.Quit();
}
}
}
Code Walkthrough
We have used the NUnit framework for demonstration. You can check out our earlier blog on NUnit Test automation with Selenium C# to get started with the NUnit framework.
1. Include HttpClient namespace
The HttpClient class provides the base class used for sending HTTP requests and receiving the corresponding response from the resource identified by URI.
It is recommended to use HttpClient instead of HttpWebRequest (of the System.Net.HttpWebRequest namespace) for detecting broken images using Selenium WebDriver.
using System.Net.Http;
using System.Threading.Tasks;
2. Create a method that returns an async task
The GetAsync method is used for sending a GET request to the specified URI as an asynchronous operation.
[Test]
public async Task LT_Broken_Images_Test()
{
3. Create an instance of HttpClient
An instance of the HttpClient is created. The methods offered by HttpClient class will be further used for fetching the details of images present on the page under test.
using var client = new HttpClient();
4. Read the images present on the page
The details of the images present on the page are fetched by locating the WebElements with TagName ‘img’ property.
var image_list = driver.FindElements(By.TagName("img"));
The findElements method returns a list that is iterated to check the broken images on the page.
5. Iterate through the image list to check for broken images
The GetAsync method in HttpClient class sends an async GETrequest to the corresponding URI. The value of the anchor’s ‘src’ attribute collected using the GetAttribute method is passed in the GetAsync method.
foreach (var img in image_list)
{
try
{
/* Get the URI */
HttpResponseMessage response=await client.GetAsync(img.GetAttribute("src"));
6. Read the HttpStatus Code
On completion of the Async operation in Step(5), HttpResponseMessage is returned. The response includes the data and status code. Response code HttpStatusCode.OK (i.e., 200) indicates that the image was located on the server, and it was read successfully. We keep a counter of the number of broken images on the page.
if (response.StatusCode == HttpStatusCode.OK)
{
System.Console.WriteLine("Image at the link " + img.GetAttribute("src") + " is OK, status is "+ response.StatusCode);
}
else
{
System.Console.WriteLine("Image at the link " + img.GetAttribute("src") + " is Broken, status is "+ response.StatusCode);
broken_images++;
}
The exceptions NotSupportedException and ArgumentNullException are handled as part of exception handling.
catch (Exception ex)
{
if ((ex is ArgumentNullException) || (ex is NotSupportedException))
{
System.Console.WriteLine("Exception occured
");
}
}
Execution
Here is the execution snapshot, which indicates that two broken images were present on the page under test.


Implementation
FileName – composer.json
{
"require":{
"php":">=7.1",
"phpunit/phpunit":"^9",
"phpunit/phpunit-selenium": "*",
"php-webdriver/webdriver":"*"
}
}
FileName – tests\BrokenImageTest.php
<?php
require 'vendor/autoload.php';
use PHPUnitFrameworkTestCase;
use FacebookWebDriverRemoteDesiredCapabilities;
use FacebookWebDriverRemoteRemoteWebDriver;
use FacebookWebDriverWebDriverBy;
$GLOBALS['LT_USERNAME'] = "user-name";
# accessKey: AccessKey can be generated from automation dashboard or profile section
$GLOBALS['LT_APPKEY'] = "access-key";
class BrokenImageTest extends TestCase
{
protected $webDriver;
public function build_browser_capabilities(){
/* $capabilities = DesiredCapabilities::chrome(); */
$capabilities = array(
"build" => "[PHP] Finding broken images on a webpage using Selenium",
"name" => "[PHP] Finding broken images on a webpage using Selenium",
"platform" => "macOS High Sierra",
"browserName" => "MicrosoftEdge",
"version" => "latest"
);
return $capabilities;
}
public function setUp(): void
{
$capabilities = $this->build_browser_capabilities();
/* Download the Selenium Server 3.141.59 from
https://selenium-release.storage.googleapis.com/3.141/selenium-server-standalone-3.141.59.jar
*/
$url = "https://". $GLOBALS['LT_USERNAME'] .":" . $GLOBALS['LT_APPKEY'] ."@hub.lambdatest.com/wd/hub";
$this->webDriver = RemoteWebDriver::create($url, $capabilities);
}
public function tearDown(): void
{
$this->webDriver->quit();
}
/*
* @test
*/
public function test_searchbrokenimages()
{
$test_url = "https://the-internet.herokuapp.com/broken_images";
# For site with absolute path
# $test_url = "https://www.lambdatest.com/blog";
# End - For site with absolute path
$base_url = "https://the-internet.herokuapp.com/";
$driver = $this->webDriver;
$driver->get($test_url);
$this->assertEquals("The Internet", $driver->getTitle());
# For site with absolute path
# $this->assertEquals("LambdaTest | A Cross Browser Testing Blog", $driver->getTitle());
# End - For site with absolute path
$driver->manage()->window()->maximize();
$iBrokenImageCount = 0;
/* file_get_contents is used to get the page's HTML source */
$html = file_get_contents($test_url);
/* Instantiate the DOMDocument class */
$htmlDom = new DOMDocument;
/* The HTML of the page is parsed using DOMDocument::loadHTML */
@$htmlDom->loadHTML($html);
/* Extract the links from the page */
$image_list = $htmlDom->getElementsByTagName('img');
/* The DOMNodeList object is traversed to check for its validity */
foreach($image_list as $img)
{
$img_path = $img->getAttribute('src');
# Convert relative path to absolute path
$search_path = "/" . $img_path;
$abs_path = relative2absolute($search_path, $base_url);
# When absolute path is used for fetching the images
# For site with absolute path
# $abs_path = $img_path;
# For site with absolute path
$response = @get_headers($abs_path);
if (preg_match("|200|", $response[0]))
{
print($abs_path . " is not broken
");
}
else
{
print($abs_path . " is broken
");
$iBrokenImageCount = $iBrokenImageCount + 1;
}
}
print("
The page " . $test_url . " has " . $iBrokenImageCount . " broken images");
}
}
?>
<?php
function relative2absolute($rel_path, $base_path)
{
/* return if already absolute URL */
if (parse_url($rel_path, PHP_URL_SCHEME) != '') return $rel_path;
/* queries and anchors */
if ($rel_path[0]=='#' || $rel_path[0]=='?') return $base_path.$rel_path;
/* parse base URL and convert to local variables:
$scheme, $host, $path */
extract(parse_url($base_path));
/* remove non-directory element from path */
$new_path = preg_replace('#/[^/]*$#', '', $base_path);
/* destroy path if relative url points to root */
if ($rel_path[0] == '/') $new_path = '';
/* dirty absolute URL */
$abs_path = "$host$new_path/$rel_path";
/* replace '//' or '/./' or '/foo/../' with '/' */
$repl = array('#(/.?/)#', '#/(?!..)[^/]+/../#');
for($counter=1; $counter>0; $abs_path=preg_replace($repl, '/', $abs_path, -1, $counter)) {}
/* absolute URL is ready! */
return $scheme.'://'.$abs_path;
}
?>
Code Walkthrough
To find broken images with Selenium PHP, we would be using the PHPUnit framework with Selenium. Refer to our detailed Selenium PHP tutorial for a quick recap on Selenium with PHPUnit.
Run the command composer require on the terminal for installing the packages mentioned in composer.json.

Here is the overall walkthrough of the source code:
1. Read the page source
HTML source of the page under test (i.e. https://the-internet.herokuapp.com/) is read using the file_get_contents function in PHP. HTML source is read in a local String variable $html.
$html = file_get_contents($test_url);
2. Instantiate the DOMDocument class
The entire HTML document is represented in the DOMDocument class. It also serves as the root of the source tree.
$htmlDom = new DOMDocument;
3. Parse HTML source of the page
DOMDocument::loadHTML() function in PHP parses the HTML source available in the String variable $html. The function returns a DOMDocument object when executed successfully.
@$htmlDom->loadHTML($html);
4. Extract the Images using ‘img’ tag
Entries in the <img> HTML tag are read using the getElementsByTagName method of the DOMDocument class. As we are looking for broken images, search is based on the <img> tag from the parsed HTML source.
$image_list = $htmlDom->getElementsByTagName('img');
5. Read the entries enclosed in ‘src’ attribute
The values of the ‘src’ attribute are read from the <img> entries extracted in Step(4).
$img_path = $img->getAttribute('src');
For example – The ‘src’ attribute in <img src=”img/avatar-blank.jpg”> is “img/avatar-blank.jpg”.
6. Convert the relative path to absolute path
This step is only applicable if the ‘src’ attribute in the <img> tag returns a relative path from the root of the document.
In the case of http://the-internet.herokuapp.com/broken_images, the images are read using the relative path.

Take the TestMu AI blog case, the images in the blogs are read using the absolute path of the images on the server. Shown below is an example of how the absolute path of the image is used in the ‘src’ attribute of the <img> tag:

We created a new function relative2absolute() that takes the following arguments – relative path obtained from the <src> attribute and root document of the URL under test.
<?php
function relative2absolute($rel_path, $base_path)
{
/* return if already absolute URL */
if (parse_url($rel_path, PHP_URL_SCHEME) != '') return $rel_path;
/* queries and anchors */
if ($rel_path[0]=='#' || $rel_path[0]=='?') return $base_path.$rel_path;
/* parse base URL and convert to local variables:
$scheme, $host, $path */
extract(parse_url($base_path));
.............................................
.............................................
.............................................
}
Relative Path (Sample)
For http://the-internet.herokuapp.com/broken_images, the relative path would equate to /$img_path. If the $img_path is img/avatar-blank.jpg; the final relative path used by the relative2absolute function would be /img/avatar-blank.jpg The base URL is set to https://the-internet.herokuapp.com/
$test_url = "https://the-internet.herokuapp.com/broken_images";
..........................................................
..........................................................
..........................................................
# Convert relative path to absolute path
$search_path = "/" . $img_path;
$abs_path = relative2absolute($search_path, $base_url);
Absolute path (Sample)
If an absolute path is used in the <src> attribute, the absolute path and the relative path would be the same. In such a scenario, Step (6) becomes optional.
$test_url = "https://www.lambdatest.com/blog";
..........................................................
..........................................................
..........................................................
$img_path = $img->getAttribute('src');
$abs_path = $img_path;
We came up with the relative2absolute function with support from the StackOverflow Community ☺.
7. Convert the relative path to absolute path
The get_headers() function is used to fetch all the headers sent by the server in response to the HTTP request. For a broken image, the HTTP status code is 404, whereas the status code is 200 if the image is present on the server.
The preg_match() function in PHP does a case-insensitive search for “200” (HTTP Status Code if the request is completed successfully) in the response code. The local variable iBrokenImageCount is incremented when a broken image is present on the page.
$response = @get_headers($abs_path);
if (preg_match("|200|", $response[0]))
{
print($abs_path . " is not broken
");
}
else
{
print($abs_path . " is broken
");
$iBrokenImageCount = $iBrokenImageCount + 1;
}
Execution
To run the test that is using the PHPUnit framework, run the following command from the root folder:
vendorinphpunit --debug test
When the test is run against https://the-internet.herokuapp.com/broken_images, it shows that the page has two broken images.

We executed the same test against the TestMu AI blog after doing the minimal changes in the code under the ‘For a site with absolute path’ comment.


The site uses the absolute path in <src> attribute of img tag. As seen below, there are zero broken images on the TestMu AI blog.

Like broken links on web pages, broken images could also hinder the overall user experience. It also creates a negative impact on the search rankings, thereby hampering your SEO efforts. Instead of relying on third-party tools where you are putting the privacy & data at stake, you should find broken images using Selenium WebDriver. In this Selenium tutorial, we had a look at how to find broken images using Selenium WebDriver with Java, Python, C#, and PHP languages.
What strategy do you follow for finding broken images on webpage(s)? Do leave your thoughts in the comments section…
Happy Testing ☺

Deliver immersive digital experiences with Next-Generation Mobile Apps and Cross Browser Testing Cloud
Did you find this page helpful?
More Related Hubs
TestMu AI forEnterprise
Get access to solutions built on Enterprise
grade security, privacy, & compliance