Best Practices for ICE Candidate Trickle vs Bulk Gathering
Trickle ICE transmits candidates incrementally via onicecandidate as the agent discovers them; bulk gathering waits for iceGatheringState to reach 'complete' and ships the full SDP in one message. This guide is part of the ICE Candidate Gathering & Filtering guide, and it resolves one decision: which mode to use, and how to fall back safely when your chosen mode stalls. The short answer is trickle in almost every case β it reduces Time-to-First-Frame by 200β800 ms in typical conditions and up to 2β4 s versus bulk on high-latency or relay-heavy paths.
Context & Trade-offs
Bulk gathering is simpler to reason about: you have one complete local description, one signalling message, and no ordering concerns. That simplicity costs latency. Gathering must finish β including TURN allocation, which can take hundreds of milliseconds β before anything reaches the remote peer, and the remote peer cannot begin connectivity checks until it receives that full SDP. On mobile and carrier networks the cost compounds: STUN bindings can refresh in under 30 seconds, so candidates that sat in a bulk SDP waiting for slow gathering may already be stale when the remote peer applies them.
Trickle inverts this. The first host candidate can reach the remote peer within a few milliseconds of setLocalDescription(), connectivity checks start immediately on the cheapest path, and srflx/relay candidates arrive later to upgrade or rescue the connection. The cost is signalling complexity: candidates arrive asynchronously, out of order is possible, and each one must carry its exact sdpMid and sdpMLineIndex.
| Dimension | Trickle | Bulk |
|---|---|---|
| Time-to-First-Frame | 200β800 ms faster | baseline (slowest) |
| Signalling messages | 3β10 per peer | 1 |
| Ordering required | yes (idempotent queue) | no |
| Stale-candidate risk on CGNAT | low | high (>30 s mappings) |
| Best fit | web, mobile, real-time media | legacy SIP gateways, batch signalling |
The marginal extra signalling load is real but small: a typical peer generates 3β10 candidates, and a WebSocket Signaling Implementation delivers each in under 10 ms. Prefer trickle unless your signalling channel genuinely cannot stream.
There is also a hybrid worth knowing: half-trickle. The offerer waits until gathering is complete before sending the offer (so the offer carries every candidate inline), but the answerer trickles. This buys back some of bulkβs simplicity on the offer side while still letting the answerer respond fast. It is mostly a transition tactic for interoperating with a peer that cannot trickle the offer; on a modern stack where both sides trickle, full trickle is strictly better. The one place bulk still earns its keep is a signalling path that batches or serialises messages β for example a store-and-forward gateway that only processes one complete SDP per turn β where streaming candidates would arrive after the gateway has already moved on.
Minimal Runnable Implementation
const pc = new RTCPeerConnection({
iceServers: [{ urls: 'stun:stun.l.google.com:19302' }]
});
// Trickle: forward each candidate the instant it is gathered
pc.onicecandidate = (e) => {
if (e.candidate) {
signaling.send({ type: 'trickle', candidate: e.candidate.toJSON() });
}
// e.candidate === null marks end-of-gathering; do NOT forward it as a candidate
};
// Bulk fallback: if trickle stalls (restrictive NAT, slow TURN), ship the full SDP
const trickleTimeout = setTimeout(() => {
if (pc.iceGatheringState !== 'complete' && pc.localDescription) {
console.warn('Trickle stalled β switching to bulk SDP exchange');
signaling.send({ type: 'offer-complete', sdp: pc.localDescription.sdp });
}
}, 4000); // 3β5 s window before assuming trickle won't finish
pc.onicegatheringstatechange = () => {
if (pc.iceGatheringState === 'complete') {
clearTimeout(trickleTimeout);
signaling.send({ type: 'candidates-done' }); // explicit end-of-candidates
}
};
The null candidate event (e.candidate === null) is equivalent to iceGatheringState === 'complete'; either can signal end-of-gathering, but Firefox is more reliable with the state change, so prefer it for the terminal signal.
Reproduction Steps & Debugging Log Patterns
- Initialise
RTCPeerConnectionwithiceServerspointing at a deliberately high-latency TURN relay so gathering takes long enough to observe. - Intercept
onicecandidate, logging each candidateβscandidateTypeand a timestamp; note how host candidates appear within a few ms while relay candidates lag. - Watch
iceGatheringStatetransition in the console:new β gathering β complete. - Compare a trickle run against a forced-bulk run and record the delta to first
connectedevent.
Expected healthy trickle log:
// t+2ms candidate host 192.168.1.20
// t+140ms candidate srflx 203.0.113.7
// t+410ms candidate relay 198.51.100.4
// iceConnectionState: checking
// iceConnectionState: connected // long before gathering 'complete'
A stalled session shows iceConnectionState going checking β disconnected instead of connected, and pc.getStats() reports state: 'failed' on every candidate-pair. Use chrome://webrtc-internals/ to trace nomination timing and confirm iceTransportPolicy is not silently suppressing the candidates you expected.
Common Implementation Mistakes
- Assuming
onicecandidatefires synchronously withsetLocalDescription. There is an async gap; candidates arrive after gathering begins, so never block waiting for them inline. - Forwarding the
nullcandidate as a real candidate. The null event is end-of-gathering, not a peer candidate β passing it toaddIceCandidateon the remote side throws or no-ops confusingly. - Using bulk on mobile networks. Carrier-grade NAT binding timers can be under 30 s; candidates expire before the remote peer applies them. See WebRTC over CGNAT.
- Ignoring
iceTransportPolicy: 'relay'timing. With host and srflx suppressed, gathering depends entirely on TURN allocation and can run long β your bulk-fallback timeout must account for it. - No fallback at all. A pure-trickle client behind a signalling channel that drops or reorders messages can hang forever; always keep the 3β5 s bulk fallback.
FAQ
When should I force bulk ICE gathering over trickle?
Only when the signalling channel cannot handle asynchronous streams β legacy SIP gateways that require a complete SDP before responding, or systems that batch-process signalling. Modern web and mobile apps should default to trickle.
How do I detect that trickle has failed?
Monitor iceConnectionState for 'failed' or 'disconnected' while iceGatheringState stays 'gathering'. Add a heartbeat or timeout on the signalling channel and trigger the bulk fallback or pc.restartIce() (max 3 attempts) if nothing connects within 5β10 s.
Does trickle meaningfully increase signalling server load?
Only marginally β 3β10 extra small messages per peer, each delivered sub-10 ms. The 200β800 ms latency win far outweighs it.
Related: return to ICE Candidate Gathering & Filtering, or read Traversing Symmetric NAT with TURN and IPv6 Dual-Stack ICE Handling.