Testing

Speech Synthesis API: Browser Support, Voices, Limitations

Speech Synthesis API works in Chrome 33+, Edge 14+, Firefox 49+, Safari 7+, Opera 27+, and Samsung Internet 5+. Learn browser support, voices, and quirks.

Author

Prince Dewani

May 6, 2026

The Speech Synthesis API is the Web Speech API JavaScript interface that converts text into spoken audio through the device's text-to-speech engine. It supports Chrome 33+, Edge 14+, Firefox 49+, Safari 7+, Opera 27+, and Samsung Internet 5+, while the legacy Android Browser, Opera Mobile, and Internet Explorer never added support.

This guide covers what the Speech Synthesis API is, the browsers that support it, its key features, how to use it, and the known issues to plan around.

What is the Speech Synthesis API?

The Speech Synthesis API is the controller half of the Web Speech API, exposed on every page as window.speechSynthesis. It uses the SpeechSynthesisUtterance object to queue text for playback, then reads it aloud through the operating system's installed voices. The Web Speech Community Group at the WICG maintains the specification.

Which browsers does the Speech Synthesis API support?

The Speech Synthesis API works in every modern desktop and mobile browser. Chrome, Edge, Firefox, Safari, Opera, and Samsung Internet all expose window.speechSynthesis, while Internet Explorer, Opera Mobile, and the legacy stock Android Browser never added support.

Loading browser compatibility data...

Speech Synthesis API compatibility in Chrome

Chrome supports the Speech Synthesis API from Chrome 33 on Windows, macOS, Linux, ChromeOS, and Android. Chrome 1 to 32 did not support it. Chrome on iOS routes through WebKit, so the voice list there matches Safari rather than the desktop Chrome list.

Speech Synthesis API compatibility in Edge

Microsoft Edge supports the Speech Synthesis API from Edge 14 on Windows 10. Edge 12 and 13 did not support it. Chromium-based Edge from Edge 79 inherits Chrome's full implementation across Windows, macOS, Linux, and Android, with access to the Microsoft Natural neural voices on Windows 11.

Speech Synthesis API compatibility in Firefox

Firefox supports the Speech Synthesis API by default from Firefox 49 on Windows, macOS, and Linux. Firefox 31 to 48 had it disabled by default behind the media.webspeech.synth.enabled preference in about:config. Firefox for Android still does not implement the synthesis half of the Web Speech API, so any Android site that needs TTS in Firefox needs a fallback.

Speech Synthesis API compatibility in Safari

Safari supports the Speech Synthesis API from Safari 7 on macOS and from Safari on iOS 7. The mobile build adds an extra rule: speak() only fires inside a user gesture handler such as a tap or click, otherwise WebKit drops the utterance silently. The desktop voice list comes from System Settings under Accessibility, Spoken Content.

Speech Synthesis API compatibility in Opera

Opera supports the Speech Synthesis API from Opera 27 on every desktop OS. Opera 1 to 26 did not support it. Modern Opera is built on Chromium, so its speech engine and voice list track Chrome on each release. Opera Mobile on Android still does not expose window.speechSynthesis.

Speech Synthesis API compatibility in Samsung Internet

Samsung Internet supports the Speech Synthesis API from Samsung Internet 5.0 on Galaxy phones and tablets. The browser is built on Chromium and uses the Android speech service for voice playback, so the available voices come from the user's installed Google text-to-speech engine or Samsung TTS engine.

Speech Synthesis API compatibility in Android Browser

The legacy stock Android Browser does not support the Speech Synthesis API in any version. Modern Android phones get speech synthesis through Chrome for Android, Samsung Internet, or another Chromium-based browser. Firefox for Android is also missing the synthesis half of the Web Speech API today.

Speech Synthesis API compatibility in Internet Explorer

Internet Explorer does not support the Speech Synthesis API in any version. IE 5.5 through 11 never shipped a Web Speech API engine, and Microsoft has retired Internet Explorer. Sites that still serve IE 11 users need a server-rendered audio fallback or a third-party TTS service.

Note

Note: The Speech Synthesis API behaves differently across Chrome, Safari, and Firefox, with voice lists that change per OS. Test it on real browsers and OS with TestMu AI. Try TestMu AI free!

What are the key features of the Speech Synthesis API?

The Speech Synthesis API gives a small but practical surface area: a controller for playback, an utterance object for the text and its settings, and a list of installed voices to pick from. The most useful pieces are queue control, voice selection, and the lifecycle events.

  • SpeechSynthesisUtterance: The utterance object holds the text to read and the per-call settings for voice, language, pitch, rate, and volume. Each call to speak() takes one utterance and adds it to the queue.
  • Voice queue and playback control: The speechSynthesis controller exposes speak(), pause(), resume(), and cancel(), plus the speaking, pending, and paused boolean properties. A page can pause a long passage and resume it without losing position.
  • Voice selection through getVoices(): getVoices() returns the SpeechSynthesisVoice objects the operating system has installed, with name, lang, voiceURI, and localService fields. Pick a voice by language code or by name and assign it to the utterance.
  • Lifecycle events: Each utterance fires start, end, pause, resume, error, boundary, and mark events. The boundary event fires on each word or sentence break, which is what karaoke-style highlighters and read-along readers use.
  • voiceschanged event: Chrome, Edge, and Firefox load voices asynchronously from the OS. The voiceschanged event on speechSynthesis fires when the list is ready, so code that needs a specific voice should listen for it before calling getVoices().
  • Pitch, rate, and volume controls: Pitch ranges from 0 to 2 with a default of 1, rate ranges from 0.1 to 10 with a default of 1, and volume ranges from 0 to 1. These map to whatever the OS speech engine accepts.
  • Per-utterance language tag: The utterance lang property accepts a BCP 47 tag like en-US, fr-FR, or hi-IN. The browser picks a voice that matches the tag if the page does not set utterance.voice directly.
  • Mixed local and network voices: Some browsers expose cloud voices alongside the local OS voices. The localService boolean on each SpeechSynthesisVoice lets the page filter for offline-only voices when latency or privacy matters.

How do you use the Speech Synthesis API?

A working speech synthesis call is six short steps: confirm support, wait for the voice list, build an utterance, set the voice and rate, queue it with speak(), and listen for the end event so the next utterance can start.

  • Check support: Test typeof window.speechSynthesis === "object" before calling any API on it. If the check fails, fall back to a server-rendered audio file or a third-party text-to-speech service.
  • Wait for voices to load: Call window.speechSynthesis.getVoices() and, if the result is empty, attach a voiceschanged listener and read the list inside the handler. Chrome, Edge, and Firefox all populate the list asynchronously.
  • Pick a voice: Filter the voice list by lang or by name and store the chosen SpeechSynthesisVoice. For accessibility, prefer voices where localService is true so the audio works offline.
  • Build the utterance: Create a new SpeechSynthesisUtterance with the text, then assign voice, lang, rate, pitch, and volume. One utterance plays one continuous block of text.
  • Trigger speak() inside a user gesture: Call window.speechSynthesis.speak(utterance) from a click or tap handler. Safari on iOS rejects the call if it is not part of a user gesture, so a setTimeout or page-load call goes silent there.
  • Listen for the end event: Attach an end listener to the utterance to know when it finishes. Use the boundary event to highlight the current word in the UI as it plays back.

The snippet below confirms support and lists the first five installed voices, including whether each one runs locally or hits a network service:

// Run in the DevTools console of any browser to test Speech Synthesis support.
const supportsSynthesis = typeof window.speechSynthesis === "object";
console.log("Speech Synthesis:", supportsSynthesis ? "supported" : "not supported");

if (supportsSynthesis) {
  const loadVoices = () => {
    const voices = window.speechSynthesis.getVoices();
    console.log("Installed voices:", voices.length);
    voices.slice(0, 5).forEach(v => console.log(v.name, v.lang, v.localService ? "local" : "network"));
  };

  // Chrome, Edge, and Firefox load voices asynchronously.
  if (window.speechSynthesis.getVoices().length === 0) {
    window.speechSynthesis.addEventListener("voiceschanged", loadVoices, { once: true });
  } else {
    loadVoices();
  }
}
...

What are the known issues with the Speech Synthesis API?

The Speech Synthesis API has wide browser coverage on paper, but the cross-browser reality is messy. Voice lists vary per OS, autoplay rules block playback on iOS, and Chrome quietly cancels long utterances after about 15 seconds.

  • Voice list varies per OS and browser: The same site sees a US English Samantha voice on macOS, a Microsoft David voice on Windows, and a Google US English voice on Android. Code that hardcodes a voice name breaks on the next operating system.
  • Voices load asynchronously in Chrome, Edge, and Firefox: getVoices() returns an empty array on the first call. Always wire up the voiceschanged event before reading the list, otherwise the picker shows nothing on first load.
  • iOS Safari requires a user gesture: WebKit only lets speak() run inside a click, tap, or keypress handler. A page-load auto-greeting goes silent on every iPhone and iPad.
  • Chrome cancels long utterances after roughly 15 seconds: Single SpeechSynthesisUtterance objects longer than about 200 to 250 characters get cut off mid-sentence on desktop Chrome. Split text into shorter utterances and queue them in order.
  • Background tab playback is unreliable: Chrome and Safari throttle or stop synthesis when the tab loses focus. A long-form reader needs a Wake Lock or has to detect visibilitychange and re-queue.
  • Firefox for Android has no synthesis support: Firefox on Android exposes SpeechRecognition but not SpeechSynthesis. Detect missing support and route the user to Chrome or Samsung Internet for that page.
  • Network voices leak the spoken text: Cloud voices send the utterance text to the vendor's servers. Filter on localService when the content is sensitive, such as banking dictation or medical notes.
  • SSML support is partial: Most browsers ignore SpeechSynthesisUtterance SSML markup like prosody or break tags. Plan around plain text and use rate or pitch on the utterance for emphasis.

In my experience, the most surprising failure is the silent iOS drop. A speak() call that works on every desktop browser does nothing on iPhone unless it is wired straight to a click handler, and the bug rarely shows up in console logs. Always test the playback path on a real iPhone with a real tap, not just an emulator click.

...

Citations

All Speech Synthesis API version numbers and platform notes in this guide come from these primary sources:

Author

Prince Dewani is a Community Contributor at TestMu AI, where he manages content strategies around software testing, QA, and test automation. He is certified in Selenium, Cypress, Playwright, Appium, Automation Testing, and KaneAI. Prince has also presented academic research at the international conference PBCON-01. He further specializes in on-page SEO, bridging marketing with core testing technologies. On LinkedIn, he is followed by 4,300+ QA engineers, developers, DevOps experts, tech leaders, and AI-focused practitioners in the global testing community.

Open in ChatGPT Icon

Open in ChatGPT

Open in Claude Icon

Open in Claude

Open in Perplexity Icon

Open in Perplexity

Open in Grok Icon

Open in Grok

Open in Gemini AI Icon

Open in Gemini AI

Copied to Clipboard!
...

3000+ Browsers. One Platform.

See exactly how your site performs everywhere.

Try it free
...

Write Tests in Plain English with KaneAI

Create, debug, and evolve tests using natural language.

Try for free

Frequently asked questions

Did you find this page helpful?

More Related Hubs

TestMu AI forEnterprise

Get access to solutions built on Enterprise
grade security, privacy, & compliance

  • Advanced access controls
  • Advanced data retention rules
  • Advanced Local Testing
  • Premium Support options
  • Early access to beta features
  • Private Slack Channel
  • Unlimited Manual Accessibility DevTools Tests