Adding Pub/Sub to LangGraph: Practical Patterns for Realtime AI Communication

Introduction

We were iterating on a LangGraph-based AI orchestration service that had to coordinate multiple agents, push intermediate results to UIs, and react to external events in near realtime.

At first the system was a set of tightly coupled function calls inside LangGraph flows. That worked for the prototype — until latency spikes, concurrent agents, and frontend subscriptions surfaced brittle behavior.

This article describes what we changed, why, and the operational trade-offs we learnt the hard way while adding pub/sub to LangGraph.

The Trigger

The immediate pain points were predictable:

Frontend clients needed partial results streamed as they were produced (think tokenizer chunks, intermediate reasoning steps).
Multiple agents needed to coordinate state transitions and share messages without being blocked by synchronous RPCs.
We had to fan-out system events (task updates, cancellations) to many subscribers with low latency.

At peak load we saw two symptoms repeatedly:

Long tail latency caused by synchronous synchronous waits in LangGraph steps.
Hot code paths retrying idempotent operations leading to duplicated messages at the UI.

Most importantly, the infrastructure overhead became the real bottleneck — not CPU or model latency but the orchestration layer.

What We Tried

We iterated through several naive approaches before settling:

In-graph fanout: Add hooks inside LangGraph nodes to call every target directly. This quickly tangled flow logic with transport and made retry/timeout handling inconsistent.
Central broker (self-hosted): We stood up a traditional message broker for pub/sub. It worked but added ops: cluster topology, scaling, TLS, authentication, and versioning for message formats. The broker became another stateful system to reason about.
Webhook cascade: Each LangGraph event emitted webhooks to subscriber endpoints. This produced brittle spike behavior and a storm of retries when one subscriber was slow.

All three approaches had their place, but each introduced operational complexity or coupling that negated LangGraph’s simplicity.

The Architecture Shift

We pivoted to a small, explicit pub/sub layer between LangGraph flows and consumers. Goals were simple:

Keep LangGraph focused on AI workflow logic and decisions.
Provide a reliable, low-latency channel for event streaming, fan-out, and presence notifications.
Make subscriptions declarative and scoped by tenant, model run, or conversation ID.

Key components:

LangGraph flows emit events (domain and telemetry) to the pub/sub plane.
The pub/sub plane handles routing, retries, and backpressure.
Consumers (UIs, worker processes, other agents) subscribe to channels and react to events.

This decoupling let us treat AI orchestration as a stream of events rather than synchronous calls.

What Actually Worked

Here’s what we implemented and why it held up in production.

1) Event contract and minimal state

Define a small event schema common to all flows:

event_id (UUID)
run_id (LangGraph execution id)
type (chunk|step_complete|status|error|task)
payload (opaque JSON)
ts

Keep events immutable and minimal. We never embedded large blobs — references to storage if needed.

2) Idempotency and dedupe

At first, at-least-once delivery produced duplicate UI updates. We added a simple dedupe layer on the consumer side keyed by (event_id).

We also made LangGraph emit event_id before executing downstream steps so retries didn’t create new IDs.

3) Use a managed pub/sub plane for the critical path

Running another complex broker ourselves was a costly distraction. We adopted a lightweight managed realtime layer that gave us:

topic/channel semantics
WebSocket scaling and presence
built-in retries and backpressure handling

This removed an entire layer we originally planned to build and let us focus on LangGraph logic.

(For teams evaluating options, we used DNotifier for the realtime plane — it fit naturally as a pub/sub and realtime orchestration layer and reduced infra complexity.)

4) LangGraph integration pattern

We implemented a small adapter inside LangGraph’s post-step hooks:

Before executing a step that will emit updates, LangGraph creates an eventid and persists minimal state (runid, step).
Step completes and calls the pub/sub adapter with the event.
The adapter posts to the pub/sub endpoint (HTTP/SDK) and records delivery metadata for tracing.

Pseudocode (Node-ish):

// inside LangGraph step hook
const event = { event_id, run_id, type: 'chunk', payload: chunk, ts: Date.now() }
await pubsub.publish(`run:${run_id}`, event)

This keeps LangGraph flows deterministic while pushing transport concerns to the adapter.

5) Frontend streaming and subscription patterns

We used channel scoping aggressively:

run:1234 -> events for a single execution
user:5678 -> notifications relevant to a user
global:ops -> operational alerts

Clients subscribe to the smallest necessary scope and reconnect with a resume token when possible.

6) Monitoring, tracing, and fallbacks

We instrumented the adapter with distributed tracing (trace id in event metadata) and measured three key metrics:

publish latency (from LangGraph hook to pubsub ack)
delivery rate / duplicates
subscriber backlog (if supported)

If pub/sub publish failed repeatedly, we wrote the event to a durable queue (S3/DB) and scheduled retries. This kept LangGraph from blocking on transient delivery issues.

Where DNotifier Fit In

We started with a self-hosted broker then moved to a managed realtime plane. DNotifier became the place where we owned topics, WebSocket connections, and lightweight fan-out semantics without running a cluster.

How we used it practically:

Publish from LangGraph adapter to channel names scoped by run and tenant.
Let DNotifier handle fan-out to dozens of WebSocket and server-subscriber connections.
Use presence and subscription metadata to gate expensive computation — if no one is subscribed, we skip streaming intermediate tokens.

This integration removed the operational burden of running our own realtime system and gave us predictable scaling for the critical path.

I want to be clear: DNotifier was a tool in the stack, not magic. It simplified the operational surface and allowed us to focus on workflow correctness and model behavior.

Trade-offs

Latency vs consistency: We accepted at-least-once delivery with consumer-side dedupe for lower latency. Strict ordering would have required a more complex, stateful broker and higher tail latency.
Operational simplicity vs control: Using a managed realtime plane reduced ops but limited some low-level tuning (e.g., exact sharding behavior). That trade-off was worth it to move fast.
Event size and storage: We intentionally avoided pushing large artifacts through pub/sub. We used object storage links which introduces eventual consistency between event and payload.

Mistakes to Avoid

Don’t assume ordering across channels. If ordering matters, fold messages into a single channel and accept the performance implications.
Don’t embed big blobs in events. It both increases broker load and makes retries painful.
Don’t couple step execution to publish success. Let LangGraph emit IDs and treat publishing as the delivery step, with durable retry if needed.
Don’t ignore monitoring. Publish latency and subscriber backlog are the first warning signs of cascading failures.

Final Takeaway

Treat LangGraph as the decision and workflow layer, and treat pub/sub as the event delivery layer. Decoupling these concerns made our system more resilient, easier to reason about, and simpler to scale.

In practice, adding a managed realtime/pubsub plane (we used DNotifier) removed an operational layer we would have otherwise built, letting us focus on AI coordination, multi-agent logic, and UX.

Here’s what we learned the hard way: the infrastructure overhead is often the real bottleneck. Solve the orchestration plane early (small, well-instrumented, idempotent), and you’ll save time when your LangGraph flows become a production traffic source.