How to Implement WebSocket Signaling with Node.js and Socket.IO for WebRTC
This guide builds a working signalling server with Node.js and Socket.IO that exchanges SDP offers, answers, and ICE candidates between peers without race conditions. It is part of the WebSocket Signaling Implementation guide, and it answers one precise question: when do you reach for Socket.IOβs room and reconnection machinery instead of raw WebSockets, and how do you wire it so that candidates never arrive before the remote description is set?
Context & Trade-offs
Socket.IO is a framing layer on top of WebSocket (with an HTTP long-polling fallback). For signalling it buys you three things you would otherwise hand-roll: named rooms (socket.join(roomId) and socket.to(roomId).emit(...)), automatic reconnection with backoff, and namespace isolation so signalling traffic never collides with chat or presence events on the same connection. The cost is roughly 30β60 KB of client bundle and a small per-message protocol overhead versus the raw ws library described in the parent guide.
Reach for Socket.IO when you want rapid delivery and built-in rooms, and when the long-polling fallback genuinely matters because some of your users sit behind proxies that block the WebSocket upgrade. Reach for raw WebSockets when you are running a dedicated media-server signalling path where every byte and millisecond counts and you handle reconnection yourself. For a typed contract at scale, Custom Signaling Protocols with gRPC-Web is the third option. Whichever transport you pick, the message-ordering discipline is identical: an offer must be applied before its candidates, and Socket.IOβs per-connection in-order guarantee does not save you here, because setRemoteDescription is async and a candidate can be delivered in order yet still arrive before the await resolves.
Set pingTimeout and pingInterval to bracket realistic ICE gathering windows β a pingInterval of 25 s with a pingTimeout of 60 s keeps the connection warm through carrier-grade NAT (which can drop UDP mappings in under 30 s) without false-positive disconnects during a slow TURN allocation.
One more design decision sits underneath the transport choice: rooms versus explicit peer lists. Socket.IO rooms are the natural fit because they map one-to-one onto a WebRTC call β a room is a call, its members are the peers, and socket.to(roomId) is the broadcast primitive you need for both two-party and small-group calls. The alternative, tracking an explicit peer list in your own Map and emitting to individual socket IDs, only earns its complexity when you need per-peer routing rules (selective forwarding, moderator-only messages) that a flat room broadcast cannot express. Start with rooms; reach for explicit lists only when a feature demands it.
Minimal Runnable Implementation
Isolate signalling in a dedicated /signaling namespace so WebRTC events never collide with the rest of the app. Each peer joins a room; offers, answers, and candidates are relayed only to the other members of that room via socket.to(roomId), never broadcast globally.
// server.js β npm install express socket.io
const { Server } = require('socket.io');
const http = require('http');
const server = http.createServer();
const io = new Server(server, {
cors: { origin: process.env.ALLOWED_ORIGIN ?? '*' },
pingInterval: 25000, // probe the connection every 25 s
pingTimeout: 60000 // declare dead only after 60 s of silence
});
const signaling = io.of('/signaling'); // dedicated namespace, no cross-talk
signaling.on('connection', (socket) => {
socket.on('join', (roomId) => {
if (typeof roomId !== 'string' || roomId.length > 128) return; // validate
socket.join(roomId);
socket.data.roomId = roomId;
// tell the newcomer how many peers are already present
const size = signaling.adapter.rooms.get(roomId)?.size ?? 1;
socket.emit('joined', { roomId, peers: size - 1 });
});
// Relay only to the OTHER peers in the room (socket.to excludes sender)
socket.on('offer', (d) => socket.to(d.roomId).emit('offer', d));
socket.on('answer', (d) => socket.to(d.roomId).emit('answer', d));
socket.on('candidate', (d) => socket.to(d.roomId).emit('candidate', d));
});
server.listen(3000, () => console.log('Signaling on :3000'));
On the client, the only subtlety that causes most real bugs: buffer incoming ICE candidates until remoteDescription is set, then flush them. Emit the offer first, apply the answer when it returns, and only then drain the queue.
// client.js β npm install socket.io-client
import { io } from 'socket.io-client';
const socket = io('/signaling');
const pc = new RTCPeerConnection({ iceServers: [{ urls: 'stun:stun.l.google.com:19302' }] });
const roomId = 'room-1';
const pending = []; // candidates that arrive before remoteDescription is set
socket.emit('join', roomId);
pc.onicecandidate = ({ candidate }) => {
if (candidate) socket.emit('candidate', { roomId, candidate: candidate.toJSON() });
};
async function startCall() {
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
socket.emit('offer', { roomId, sdp: offer });
}
socket.on('answer', async ({ sdp }) => {
await pc.setRemoteDescription(new RTCSessionDescription(sdp));
while (pending.length) await pc.addIceCandidate(pending.shift()); // flush
});
socket.on('candidate', async ({ candidate }) => {
if (pc.remoteDescription) await pc.addIceCandidate(candidate);
else pending.push(candidate); // not ready yet β buffer it
});
Reproduction Steps & Debugging Log Patterns
- Open two browser tabs and have both run
socket.emit('join', 'room-1'). The serverβsjoinedevent should reportpeers: 0for the first tab andpeers: 1for the second. - Call
startCall()in the first tab. Watch the second tabβs console β it should log an incomingofferevent before anycandidateevents. - Throttle the network in Chrome DevTools (Network β Slow 3G) and trigger the offer, then immediately let candidates fly. You should see the buffer engage:
[signaling] joined room-1 (peers: 1)
[signaling] queued candidate, remoteDescription not set (signalingState: have-local-offer)
[signaling] queued candidate, remoteDescription not set (signalingState: have-local-offer)
[signaling] answer applied, flushing 2 buffered candidates
iceConnectionState: checking β connected
- If instead you see
InvalidStateError: Failed to execute 'addIceCandidate': The remote description was null, your buffer is not engaging β a candidate handler is callingaddIceCandidatebefore the answer landed. - Kill the network briefly and restore it. Socket.IO should emit
reconnect; confirm your handler rejoins the room and thaticeConnectionStatestaysconnectedthroughout, proving the media path survived the signalling drop.
Wire the rejoin handler explicitly β Socket.IO restores the transport but never re-runs your application-level join, so a reconnected socket lands in no room and silently stops receiving relays until you re-emit:
socket.on('connect', () => {
socket.emit('join', roomId); // re-enter the room every connect
if (pc.signalingState === 'have-local-offer') // resend a pending offer
socket.emit('offer', { roomId, sdp: pc.localDescription });
});
socket.io.on('reconnect_attempt', (n) =>
console.log(`[signaling] reconnect attempt ${n}`)); // surfaces backoff cadence
Expected output across a forced disconnect is a short backoff ramp followed by a clean rejoin, with the media plane untouched throughout:
[signaling] reconnect attempt 1
[signaling] reconnect attempt 2
[signaling] joined room-1 (peers: 1)
iceConnectionState: connected (unchanged β media survived the drop)
Common Implementation Mistakes
- Premature candidate application. Calling
addIceCandidatebeforesetRemoteDescriptionthrowsInvalidStateError. Buffer until the remote description is set, then flush β this is the single most common Socket.IO signalling bug. - Default namespace collision. Running signalling on the root namespace lets chat, presence, or analytics events collide with
offer/answer. Always isolate in/signaling. - Global broadcast. Using
signaling.emit(...)orsocket.broadcast.emit(...)instead ofsocket.to(roomId).emit(...)leaks SDP to every connected client and breaks multi-room deployments. - No rejoin on reconnect. Socket.IO reconnects the transport automatically, but it does not re-run your
join. Onreconnect, re-emitjoinand re-send the local description ifsignalingStateishave-local-offer. - Renegotiating media on every reconnect. The
RTCPeerConnectionsurvives a Socket.IO reconnect untouched. Only restart ICE if ICE reportsfailed; do not tear down and rebuild the offer.
FAQ
Why does Socket.IO seem to drop ICE candidates under high latency?
It does not β Socket.IO delivers in order per connection. The browserβs RTCPeerConnection rejects candidates when setRemoteDescription has not yet run, and under latency the answer lands late, so early candidates fail. The client-side buffer above holds them until the remote description is set.
Should I use Socket.IO or raw WebSockets for production signalling?
Use Socket.IO when you want rooms, reconnection, and a long-polling fallback out of the box, which suits most application servers. Drop to raw ws for dedicated media-server signalling where overhead must be minimal; the routing and backpressure patterns in the WebSocket Signaling Implementation guide apply directly.
How does this scale past a single Node process?
Socket.IO rooms are per-process by default, so two peers on different nodes cannot see each other. Add the Redis adapter so room emits fan out across nodes β the full setup is in Scaling WebSocket Signaling with Redis Pub/Sub.
Related: return to WebSocket Signaling Implementation for transport-agnostic routing and backpressure, scale out with Scaling WebSocket Signaling with Redis Pub/Sub, and model offer/answer transitions with Signaling State Machine Patterns.