When you make a VoIP call and it sounds like you’re talking through a tin can, it’s not just bad luck-it’s a measurable problem. Network engineers don’t guess why calls sound bad. They use MOS and PESQ to find out exactly how bad, and why. These aren’t buzzwords. They’re the standard tools used by telecom providers, enterprise IT teams, and hardware manufacturers to ensure your voice calls actually sound clear.
What MOS Really Means for Your Calls
MOS stands for Mean Opinion Score. It’s a number between 1 and 5 that tells you how good a voice call sounds to a human listener. A score of 1 means the call is barely understandable-think robotic, choppy, and full of dropouts. A 5 is perfect, crystal-clear, like you’re in the same room. But here’s the catch: in the real world, you’ll rarely see a 5. The highest practical MOS you’ll get from even the best VoIP codec is 4.5. Why? Because no network is perfect. Even if you’re on a fiber line with zero packet loss, the phone hardware, background noise, and how the codec processes your voice all chip away at quality. A MOS of 4.0 or higher is considered “toll quality”-the same standard as old-school landline phones. Anything below 3.5? Users will complain. They’ll say the call is “unusable,” even if the network metrics look fine. The most common codec in use today, G.711, hits a MOS of 4.4. That’s why it’s still the default in so many business systems. It doesn’t save bandwidth-it uses 64 kbps per call-but it sounds great. If you’re running a call center or a high-end video conferencing setup, you’re probably using G.711 because you can’t afford to sacrifice clarity.How PESQ Measures What Humans Hear
MOS sounds simple, but how do you get that number without gathering 100 people to listen to every call? That’s where PESQ comes in. PESQ, or Perceptual Evaluation of Speech Quality, is an algorithm that simulates human hearing. It doesn’t guess. It compares two audio files: the original clean voice recording and the degraded version that came out of your VoIP system. Think of it like a spellchecker for voice. You feed it the original sentence: “Can you hear me clearly?” Then you feed it the version that came through a bad network: “Ca-can you he-hear me-clear-ly?” PESQ analyzes the timing, pitch, volume, and distortion-and gives you a score that matches what real people would say. It works in two modes: narrowband (8 kHz, like traditional phones) and wideband (16 kHz, for HD voice). The standard, ITU-T P.862, was released in 2001 and still holds up today. But it has limits. It needs a clean reference file. That means you can’t use PESQ on live traffic-you have to record the call first. It’s a lab tool, not a real-time monitor.Codec Quality Compared: MOS Scores You Can Trust
Not all VoIP codecs are created equal. Here’s what actual MOS scores look like based on real PESQ testing:- G.711: 4.4 MOS - Best sound, highest bandwidth (64 kbps)
- G.722: 4.5 MOS - HD voice, wideband, 48-64 kbps
- G.729: 3.9 MOS - Popular for bandwidth savings, 8 kbps
- G.729A: 3.75-3.8 MOS - Lighter on CPU, slightly worse sound
- G.728: 3.8 MOS - Older, 16 kbps, used in legacy systems
- G.723.1: 3.5 MOS - Very low bandwidth (6.3 kbps), noticeable artifacts
- Opus: 3.5-4.5 MOS - Adaptive, adjusts based on network, best for mixed conditions
Why PESQ Isn’t Perfect-And When It Lies
PESQ is the gold standard, but it’s not magic. It has blind spots. For example:- It ignores echo. If your headset feedbacks, PESQ won’t care. It only compares the audio waveform. Human listeners hate echo-and will rate a call poorly even if PESQ says it’s 4.2.
- It struggles with burst packet loss. If 10 packets drop all at once, PESQ might still give you a 3.9. Real users hear a loud “pop” and hang up. Studies show PESQ can overrate quality by up to 0.7 MOS points in these cases.
- It’s biased toward male voices. Testing by GL Communications found female voices scored 0.26 MOS points lower than male voices on the same G.722 codec. That’s because the algorithm was trained mostly on male speech samples.
- It doesn’t work well with Opus. The original PESQ standard was designed for fixed-bitrate codecs. Opus changes bitrate constantly. Newer versions of PESQ (like in Audio Precision’s APx500 v8.0) now handle it better, but older tools still give inaccurate scores.
What You Need to Test PESQ in Practice
You can’t just download PESQ for free. It’s built into expensive test gear. Tools like Audio Precision’s APx500, Viavi’s OneExpert, or GL Communications’ AutoVQT are the only ones that do it right. These machines cost $25,000 or more. Most small businesses don’t have them. But here’s what you can do:- Record a clean call using G.711 as your reference.
- Send the same call through your VoIP system under test.
- Use a tool like Wireshark to capture the packets, then reconstruct the audio.
- Run both files through a licensed PESQ analyzer (many vendors offer cloud-based testing now).
The Future: POLQA and Beyond
PESQ is aging. For wideband and HD voice, the industry is moving to POLQA (ITU-T P.863). It supports up to 20 kHz audio, handles modern codecs better, and is more accurate with music and background noise. Cisco’s Webex Quality Engine already uses POLQA scores to auto-tune codecs in real time. But PESQ isn’t going away. Why? Because every telecom regulator, every service contract, every quality benchmark written in the last 20 years references PESQ. In 2025, 73% of global telecom regulators still require PESQ for certification. You can’t ignore it. The real trend? Blending both. Use PESQ for narrowband legacy systems. Use POLQA for HD voice and video calls. And always, always cross-check with real user feedback. Because at the end of the day, your customers don’t care about MOS scores. They care if they can hear their boss say, “Can you send me that report?” without repeating themselves.What to Do Next
If you manage VoIP systems:- Know your codecs. Don’t pick G.729 just because it’s “efficient.” Test it under real network stress.
- Use PESQ for pre-deployment testing. Even if you rent test equipment for a day.
- Monitor for echo and jitter separately. PESQ won’t catch them.
- Start collecting user feedback. A simple “How was your call?” survey after each meeting adds real-world context to your numbers.
- Plan for POLQA. If you’re upgrading to HD voice or cloud UC platforms, your next test tool should support it.
What is a good MOS score for VoIP calls?
A MOS score of 4.0 or higher is considered “toll quality” and is acceptable for business use. Scores between 3.5 and 3.9 are usable but may cause user complaints. Below 3.5, calls are generally considered poor or unusable. The highest achievable score is 4.5, typically seen with G.722 or Opus under ideal conditions.
Is PESQ the same as MOS?
No. MOS is the subjective score humans give to call quality (1-5). PESQ is an automated algorithm that predicts MOS by comparing a clean audio file to a degraded one. PESQ outputs a number that’s mapped to the MOS scale. So PESQ gives you an estimate of MOS without needing people to listen.
Can I use PESQ to test live VoIP traffic?
No. PESQ requires a clean reference signal and a degraded version of the same signal. This means you must record both before and after transmission. It’s designed for lab testing, not live network monitoring. For live networks, use tools like RTCP XR or crowd-sourced feedback instead.
Why does G.711 sound better than G.729 even though it uses more bandwidth?
G.711 uses uncompressed PCM audio-it sends every sample exactly as recorded. G.729 is a compressed codec that removes data it thinks humans won’t notice. That saves bandwidth but also removes subtle vocal tones and timing details. The result? G.711 scores 4.4 MOS, G.729 only 3.9. The trade-off is clear: bandwidth for clarity.
Should I switch from PESQ to POLQA?
If you’re using narrowband codecs (like G.711 or G.729) and your network is stable, stick with PESQ-it’s well-established and widely accepted. If you’re deploying wideband HD voice (like G.722, Opus, or EVS), switch to POLQA. It’s more accurate for higher frequencies and modern codecs. Many new test tools support both, so you can run them side by side.