Implementing Simulcast with Three Quality Layers in Chrome

This guide is part of the Simulcast & SVC Implementation guide, and it solves one concrete problem: getting Chrome to emit three independent simulcast RTP streams — low, medium, high — with distinct SSRCs that a Selective Forwarding Unit can forward selectively. The two things that break it are codec choice and call timing, and both have deterministic fixes.

Context & Trade-offs

Chrome produces real three-layer simulcast only on VP8, VP9, or AV1. Chromium routes H.264 through its SVC path and caps it at two simulcast layers, so a session that looks correctly configured but negotiated H.264 will show one or two SSRCs and never a third. The fix is to set codec preferences to a VP-family codec before the first offer.

The second constraint is timing. setParameters() must run before the first frame is encoded — in practice, immediately after you attach the track and before createOffer(). Call it after encoding starts and Chrome throws InvalidModificationError, because the rid set is frozen once negotiation locks in. Only active, maxBitrate, and scaleResolutionDownBy stay mutable afterward.

Bitrate spacing is the third lever. Chrome’s bandwidth estimator drops the highest layer when adjacent ceilings sit too close together, because it cannot distinguish them as separate operating points. A useful rule is that each layer’s maxBitrate should be at least 2× the layer below it — 150 / 500 / 2000 kbps spaces cleanly, while 400 / 500 / 700 collapses. These ceilings are caps, not send rates: Google Congestion Control allocates the real bitrate underneath them, as detailed in Bandwidth Estimation & Congestion Control. The scaleResolutionDownBy factors of 1.0 / 2.0 / 4.0 matter just as much: omit them and all three encodings run at full resolution, tripling encoder load for no benefit.

Chrome also enforces a startup ramp. On a fresh connection it does not light up all three layers at once — it brings up the lowest layer first, then probes upward as the estimator gains confidence in the available bandwidth, typically over the first few seconds. This is by design and is why a freshly connected session may briefly show one or two active SSRCs before the third appears; do not mistake the ramp for a misconfiguration. The corollary is that aggressive application-layer rate limiting on top of Chrome’s pacer breaks the probe: if you cap the send rate below the combined layer ceiling, GCC reads the artificial ceiling as congestion, never finishes probing, and the top layer stays suspended. Let the browser’s pacer own rate control and only set per-layer maxBitrate ceilings.

Minimal Runnable Implementation

const pc = new RTCPeerConnection(config);

// 1) Add the track FIRST and capture the sender/transceiver
const transceiver = pc.addTransceiver(videoTrack, {
  direction: 'sendonly',
  // 'high' first = base resolution; low/mid are scaled down from it
  sendEncodings: [
    { rid: 'high',   active: true, maxBitrate: 2_000_000, scaleResolutionDownBy: 1.0, maxFramerate: 30 },
    { rid: 'medium', active: true, maxBitrate:   500_000, scaleResolutionDownBy: 2.0, maxFramerate: 30 },
    { rid: 'low',    active: true, maxBitrate:   150_000, scaleResolutionDownBy: 4.0, maxFramerate: 15 }
  ]
});

// 2) Force VP8 (or VP9) BEFORE the offer — H.264 collapses to 2-layer SVC in Chrome
const caps = RTCRtpSender.getCapabilities('video');
const preferred = caps.codecs.filter(c => /vp8/i.test(c.mimeType));
transceiver.setCodecPreferences([...preferred, ...caps.codecs]);

// 3) Now negotiate — Chrome emits a=simulcast and a=rid:high;medium;low automatically
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);

Note the addTransceiver path here rather than addTrack. Both work, but addTransceiver with an explicit sendEncodings array sets the simulcast configuration atomically at creation, before any frame can be encoded, which sidesteps the timing race that addTrack followed by setParameters() is prone to. With addTrack you must capture the returned sender, read its parameters, mutate the encodings array, and write it back — all before createOffer() — and if any async work slips in between, the first frame can encode and freeze the rid set. The transceiver form removes that window. Use it for new code and reserve the getParameters/setParameters dance for adjusting an already-negotiated sender at runtime.

A runtime layer toggle never renegotiates — it flips active and writes the parameters back, freeing the disabled layer’s budget for the survivors:

// Disable the top layer under sustained congestion, no createOffer required
async function disableHighLayer(sender) {
  const params = sender.getParameters();
  const high = params.encodings.find(e => e.rid === 'high');
  if (high) high.active = false;       // rid is frozen; active is mutable
  await sender.setParameters(params);
}

Reproduction Steps & Debugging Log Patterns

  1. Prefer VP8 or VP9 via transceiver.setCodecPreferences() before calling createOffer().
  2. Add the video track and capture the sender from the returned transceiver.
  3. Set the three-entry sendEncodings array with distinct rid values and powers-of-two scaleResolutionDownBy.
  4. Call createOffer() then setLocalDescription().
  5. Open chrome://webrtc-internals → Stats → VideoSender and confirm three distinct SSRC entries, each with its own growing bytesSent graph.

A healthy three-layer session logs three independent senders:

// outbound-rtp samples from getStats() once media flows
// rid=low    bytesSent rising  res=320x180   fps=15
// rid=medium bytesSent rising  res=640x360   fps=30
// rid=high   bytesSent rising  res=1280x720  fps=30

A single SSRC with no per-rid reports means either setParameters() ran after encoding began, or H.264 is the negotiated codec — check the codec line in the exported SDP. One layer with frozen bytesSent while the others climb usually means CPU pressure or ceilings packed too closely, not network loss; cross-reference availableOutgoingBitrate on the transport report to tell them apart.

To distinguish CPU pressure from congestion definitively, read the encoder’s own counters. When qualityLimitationReason on the outbound-rtp report reads cpu, the encoder is shedding the top layer because it cannot keep up — reducing bitrate will not help, and the fix is fewer layers, a lower resolution, or hardware encoding. When it reads bandwidth, the estimator suspended the layer and it will return as the network recovers. A reason of none with a missing layer points back to configuration — almost always the codec or the call-timing error above.

Resolution, Frame Rate, and Layer Math

The scaleResolutionDownBy factors define the resolution ladder, but capture resolution sets the ceiling — Chrome cannot upscale, so a 640×360 capture with a 1.0 factor on the top layer produces 360p, not 720p, no matter what maxBitrate you set. Request a high-enough capture in your getUserMedia constraints (1280×720 or 1920×1080) and let the factors derive the lower tiers. The arithmetic is exact: factor 2.0 halves each dimension and quarters the pixel count, so the encode cost of the mid layer is roughly a quarter of high, and low at factor 4.0 is roughly a sixteenth. That is why three simulcast layers cost well under 3× a single encode — the two scaled layers are cheap relative to the base.

Frame rate is an independent knob. Setting maxFramerate: 15 on the low layer while the others run 30 halves that layer’s temporal load and bitrate without touching resolution, which is the right move for a thumbnail tier that does not need smooth motion. Chrome honors per-layer maxFramerate, but the source capture frame rate is again the ceiling — a 15 fps camera cannot produce a 30 fps top layer. Confirm both resolution and frame rate per layer in the outbound-rtp frameWidth, frameHeight, and framesPerSecond fields rather than trusting the requested config, because a constraint the camera could not satisfy fails silently.

Common Implementation Mistakes

FAQ

Why does Chrome show only one or two simulcast layers despite configuring three? Most often H.264 negotiated instead of VP8/VP9/AV1, or setParameters() ran after encoding started. Confirm the negotiated codec in chrome://webrtc-internals and that the call precedes createOffer().

Can I force simulcast over H.264 in Chrome? No. Chromium’s H.264 path uses SVC internally and will not emit independent simulcast SSRCs. Use VP8 or VP9 for reliable three-layer simulcast — the codec trade-offs are in VP8 vs H.264 vs AV1 Codec Selection.

How do I debug a layer going inactive in production? Watch outbound-rtp per rid for bytesSent dropping to zero, and compare against availableOutgoingBitrate. A layer dying while transport bandwidth stays healthy points to CPU or an encoder limit rather than congestion. Read qualityLimitationReason to confirm: cpu means shed fewer layers or drop resolution, bandwidth means the layer returns as the network recovers, none points back to a codec or call-timing misconfiguration.

Does the SFU need anything special to forward Chrome’s simulcast layers? Only the rid-to-SSRC mapping from the negotiated SDP. The SFU reads a=rid:high, a=rid:medium, a=rid:low and the associated SSRCs, then forwards whichever stream matches a subscriber’s downlink and drops the rest — no codec parsing. Keep the rid strings identical on client and server so the forwarding table lines up, as covered in Simulcast-Aware Forwarding.

Related: return to Simulcast & SVC Implementation, then read Choosing Simulcast vs SVC for Large Conferences and Configuring AV1 SVC Layers in WebRTC to compare against the single-stream approach, and Simulcast-Aware Forwarding for the server side.