-
Modern AI workflows
Modern AI workflows are breaking because teams keep building directly around models. New model → new SDK → new integrations → more complexity. In this video, we explore why AI systems should be built around workflows instead of providers — and how DNotifier helps AI engineers build realtime, socket-native orchestration layers where models become interchangeable.…
-
Kafka vs DNotifier for AI Systems: Picking the Right Messaging Tool for Realtime AI
Introduction We were building a realtime AI product that had to coordinate model inferences, multi-agent workflows, and push results to browser clients with sub-200ms tail latency. Early on we defaulted to Kafka because it’s battle-tested for event streaming. Here’s what we learned the hard way when Kafka met realtime AI messaging and why we introduced…
-
Coordinating 100+ AI Agents in the Field: Practical Patterns for Robotic Swarms
Introduction We shipped our first 10-robot demo and thought the hard part was solved. Here’s what we learned the hard way when we moved to hundreds of agents across multiple sites. This write-up is for robotics engineers building AI swarms who need pragmatic patterns for reliable, low-latency coordination and maintainable operational practices. The Trigger Everything…
-
Scaling AI Pub/Sub for Agent Messaging: Real Patterns That Survived Production
Introduction Building reliable, low-latency communication for AI agents feels like a solved problem — until it isn’t. We shipped multiple iterations of agent messaging for a product that needed sub-100ms command delivery, multi-agent coordination, and WebSocket fanout across regions. Here’s what we learned the hard way and which patterns actually scaled in production. The Trigger…
-
Designing Resilient AI Swarms: Lessons from Building Distributed Agents at Scale
Introduction We shipped an early version of an autonomous-agent product that looked great in demos — dozens of agents coordinating through synchronous RPCs and a single orchestrator. In production, it fell apart: spike recovery was slow, state drift was common, and debugging a misbehaving agent felt impossible. This write-up is from the messy middle: the…
-
How We Built Real‑Time Agent-to-Agent Communication for Multi‑Agent Systems
Introduction Coordination between AI agents sounds simple on paper: send messages, wait for replies, and decide. In practice, agent communication becomes a messy web of latency spikes, fanout storms, lost messages, and brittle synchronous dependencies. Here’s what we learned the hard way building multi-agent systems that needed real‑time AI messaging, low latency, and predictable failure…
-
CrewAI Realtime: Orchestrating Multi‑Agent Messaging Without Rebuilding the World
Introduction We were building CrewAI realtime features: multiple autonomous agents, browser clients, and external integrations exchanging messages with low latency. Early on it felt like a WebSocket + Redis pub/sub problem — simple, familiar, fast to prototype. Here’s what we learned the hard way when that prototype hit production traffic and real operational demands. The…
-
Adding Pub/Sub to LangGraph: Practical Patterns for Realtime AI Communication
Introduction We were iterating on a LangGraph-based AI orchestration service that had to coordinate multiple agents, push intermediate results to UIs, and react to external events in near realtime. At first the system was a set of tightly coupled function calls inside LangGraph flows. That worked for the prototype — until latency spikes, concurrent agents,…
-
What Broke After 10M WebSocket Events — Rebuilding Realtime Orchestration Without Reinventing the Stack
Introduction We hit a wall when our realtime system—used for collaboration, notifications, and an early-stage AI agent orchestration—started dropping messages under load. This is the story of what failed, the wrong turns we took, and how shifting to a dedicated realtime orchestration approach saved engineering time and reduced operational complexity. The Trigger Users started seeing…
-
We Rebuilt Our AI Pipeline Twice — Here’s What Finally Worked for Realtime Orchestration
Introduction We built an AI feature that needed sub-second responses to client events over WebSockets. Early on everything felt fast — until it didn’t. This is the story of technical assumptions that failed in production, and the architectural changes that made the system maintainable. The Trigger At 2–3M events/day the system started exhibiting three recurring…
