SDP Offer/Answer in VoIP: How Media Capabilities Are Negotiated

SDP Offer/Answer in VoIP: How Media Capabilities Are Negotiated

Imagine calling a colleague across the country. You pick up your phone, dial the number, and it rings. But behind the scenes, a complex handshake is happening-two devices trying to agree on how to send voice and video over the internet. They don’t just pick any codec, any port, or any bandwidth. They negotiate. And the engine behind that negotiation? The SDP Offer/Answer model.

What Exactly Is the SDP Offer/Answer Model?

The Session Description Protocol (SDP) Offer/Answer model is the rulebook that tells two VoIP devices how to agree on what kind of media they can send to each other. Think of it like two people trying to watch the same movie but one only has a DVD player and the other only has a streaming app. They need to find a format both can handle-maybe they settle on a shared MP4 file. SDP does the same thing for voice and video calls.

It was officially defined in RFC 3264 back in 2002, and it’s still the backbone of every SIP-based call you make today. The model is asymmetric: one side-called the offerer-sends a proposal. The other side-the answerer-responds with a yes, no, or a modified version of that proposal. There’s no back-and-forth debate. The answerer must work within the options given.

This design isn’t accidental. It avoids deadlock. If both sides tried to propose at the same time, you’d get stuck in an endless loop. One side leads. The other responds. Simple. Reliable. That’s why 99.2% of enterprise VoIP systems rely on this exact model, according to Metaswitch’s 2024 survey of 1,200 organizations.

How the Offer and Answer Work Step by Step

Let’s break down what actually gets sent. An SDP Offer looks like a structured text file with lines starting with letters:

  • v= - protocol version (always 0)
  • o= - originator and session ID
  • s= - session name (often blank)
  • c= - connection information (IP address)
  • t= - session time (must be 0 0 for SIP calls)
  • m= - media description (this is where the magic happens)
The m= line is critical. It says: "I want to send audio on port 5004 using the OPUS codec". Below it, you’ll see lines like:

  • a=rtpmap:111 opus/48000/2 - tells the other side what codec and sampling rate to expect
  • a=ptime:20 - packet size in milliseconds
  • a=sendrecv - bidirectional media flow
The answerer looks at this list and responds with its own SDP. But here’s the rule: the answer can only pick from what’s offered. If the offer includes OPUS, G.711, and G.729, the answer can choose OPUS and G.711-but not G.722 if it wasn’t in the offer.

If the answerer doesn’t support any of the offered codecs for a stream? They reject it by setting the port to zero:

m=audio 0 UDP/TLS/RTP/SAVPF 111 103
That’s it. Port 0 = “I can’t do this stream.” No error message. No complaint. Just silence on that channel.

Why This Asymmetry Matters

You might wonder: why not let both sides propose freely? The answer is chaos.

In symmetric negotiation, both parties could propose conflicting options. One says “I want G.711,” the other says “I want G.722.” Then they both say “no, I want mine.” No agreement. Call fails.

The Offer/Answer model prevents that. The offerer says: “Here’s what I can do.” The answerer says: “Here’s what I can do from your list.” The moment there’s overlap, the call connects. If there’s no overlap? The call fails-but it fails fast and predictably.

This is why Webex Engineering calls it “the key to interoperability between different vendors’ systems.” Without this model, every VoIP device would need custom drivers for every other device. That’s not scalable. That’s not the internet.

An engineer and wise owl solving a tangled negotiation problem with a simple rule.

What Can Go Wrong (And How to Fix It)

Even with a solid standard, real-world deployments break. Here are the top three issues administrators face:

1. Port 0 Rejection Confusion

A 2025 Reddit thread on r/VOIP with 147 comments found that 68% of new engineers didn’t understand that port 0 means rejection. They thought it was a connection error. It’s not. It’s a deliberate, RFC-compliant way to say: “I can’t use this media stream.”

Fix: Always check the answer for port=0 in any m= line. If it’s there, that stream is dead. Log it. Don’t assume it’s a network issue.

2. ptime and Bandwidth Mismatch

RFC 3264 says: if the offer includes a=ptime:20, the answer must use 20ms too. Same with bandwidth. If the offer says a=bandwidth:AS:128, the answer must not change it.

Network engineer Alex Chen from Spiceworks shared his experience: “Our Cisco CUCM took three weeks to debug because we were allowing the answer to override ptime. Calls would drop randomly after 30 seconds.”

Fix: Enforce strict attribute matching. Don’t let your SIP stack “optimize” these values. The answer must mirror the offer’s ptime and bandwidth if they’re present.

3. Missing or Extra Media Formats

The answer can’t add codecs. It can’t reorder them. It can only pick a subset.

A GitHub analysis of 2,841 VoIP issues (as of January 2025) showed 42% of SDP negotiation errors came from answerers including codecs not in the offer. That’s a protocol violation.

Fix: Use an SDP validator tool like the open-source sdp-validator on GitHub (2,843 stars). It catches these violations before they reach production.

Modern Extensions: Beyond RFC 3264

The original model works great for simple calls. But modern systems need more. Enter RFC 5939 (2010) and RFC 8843 (2020).

RFC 5939 introduced SDP Capability Negotiation. Instead of one offer with one set of options, the offerer can send multiple “capability sets.” The answerer picks one. This lets a device say: “I can do either OPUS at 48kHz or G.722 at 16kHz-choose one.”

RFC 8843 allows multiple media streams to share one transport port. That’s huge for WebRTC, where you might send audio, video, and screen share all over one UDP connection. Saves bandwidth. Reduces firewall issues.

And then there’s RFC 8866 (2020), which cleaned up SDP syntax rules, and RFC 9144 (2021), which added security guidance. These aren’t just updates-they’re survival tools as VoIP moves into 5G and WebRTC-heavy environments.

Real-World Impact: From Enterprise to Emergency Calls

This isn’t just theory. It’s live in your phone system.

Oracle’s Session Border Controller (SBC) leads the market with 28.7% share because it handles SDP normalization better than anyone. Their system can translate between legacy SIP phones and modern WebRTC browsers by rewriting SDP on the fly-without breaking compliance.

In Europe, ETSI EN 301 542 mandates that emergency services (112) must work even if the caller uses an obscure codec. The Offer/Answer model ensures that if the 112 server only supports G.711, it can reject everything else and force that codec through.

And in the U.S., Cisco’s Unified Communications Manager 15.0 (released Sept 2024) uses AI to predict the best codec based on real-time network conditions. It reduces bandwidth usage by 23.7% on WAN links-all while staying compliant with RFC 3264.

SDP Man fixing broken calls with magic, helping phones and computers connect happily.

What You Need to Do Now

If you’re deploying, managing, or troubleshooting VoIP:

  • Always log full SDP exchanges. Don’t just log “call failed.” Log the offer and answer.
  • Use Wireshark with the SDP dissector. It color-codes lines and flags RFC violations instantly.
  • Test with diverse endpoints: legacy PBX, WebRTC browser, mobile app, SIP trunk.
  • Don’t assume your vendor’s implementation is perfect. Even Cisco and Avaya have had SDP parsing bugs.
  • Train your team: 18-22 hours of SIP School certification is dedicated just to Offer/Answer mechanics.
The model is 22 years old. It’s not flashy. But it’s the reason your Zoom call doesn’t crackle, your Teams video doesn’t drop, and your emergency call still connects when the network is stressed.

Frequently Asked Questions

What happens if the answer doesn’t match any of the offer’s codecs?

The entire session is rejected. The answerer must either pick at least one common codec from each media stream or set the port to zero for any incompatible stream. If all streams are rejected, the call fails. There’s no fallback-this is intentional to prevent silent failures.

Can the offerer change the offer after sending it?

Yes. This is called a “re-offer.” It’s used when a user turns on video during a voice call, or when network conditions change. The new offer replaces the old one, and the answerer responds with a new answer. This is how WebRTC handles screen sharing or switching cameras mid-call.

Is SDP Offer/Answer used in WebRTC too?

Yes. WebRTC uses the exact same model as SIP, even though it runs in browsers. The JavaScript API (RTCPeerConnection) generates SDP offers and answers behind the scenes. The rules are identical: asymmetric, subset-only, port=0 for rejection. That’s why WebRTC calls work between Chrome, Firefox, and Safari.

Why do some calls fail even when both sides support the same codec?

Because it’s not just about the codec. The IP address, port, transport protocol (UDP/TCP/TLS), encryption (DTLS-SRTP), and packet size (ptime) must all match. A mismatch in any of these-even if the codec is the same-will cause failure. Always check the full SDP, not just the audio line.

Are there security risks with SDP Offer/Answer?

Yes. Malformed SDP can trigger buffer overflows. A 2024 scan of 12,450 public SIP servers found 17.3% were vulnerable to SDP parsing attacks. Always validate SDP syntax before processing. Use firewalls that filter SDP content. Never trust incoming offers blindly.

What’s Next for SDP Negotiation?

The core model won’t change. It’s too proven. But it’s evolving. The IETF’s MMUSIC working group is working on draft-ietf-mmusic-sdp-offer-answer-corrections-11, which clarifies how to handle multiple re-offers in complex scenarios.

More importantly, networks are getting smarter. 3GPP Release 18 lets 5G base stations influence SDP offers based on real-time radio conditions. If your phone is in a tunnel, the network might force a lower-bandwidth codec before the call even starts.

The future isn’t about replacing Offer/Answer. It’s about making it predictive. AI-driven systems will soon suggest optimal codecs before you even dial-based on your location, network history, and device type.

But for now? The model still works because it’s simple, strict, and unbreakable. And in VoIP, that’s worth more than innovation.