The Block Pipeline Pattern: Composable Middleware Without the Mess

The block pipeline pattern: composable middleware without the mess

If you have worked with middleware in any web framework, you know how it goes. You start with two or three layers. Authentication, logging, maybe CORS. It is clean. You can follow the request through the chain in your head.

Then the product grows. You add rate limiting. Then a circuit breaker. Then request transforms. Then a retry wrapper. Then someone adds a timeout layer but puts it in the wrong order and now retries happen after the timeout fires, which means you retry requests that already succeeded on the server side but timed out on yours. Nobody notices for three weeks.

By the time you have ten or twelve middleware layers, the chain is unreadable. The order matters but it is not documented. Testing one layer means spinning up the whole stack. Debugging means adding print statements at every boundary and squinting at logs.

I hit this wall when building the gateway I run today. Fifteen middleware concerns, all needing to compose cleanly, all needing different configurations per route. Express-style chaining was not going to work.

What I did instead

I call it the Block Pipeline pattern, though the name is less important than the shape.

Each middleware concern is a "block." A block is a Go interface with three or four methods. Init, Execute, Cleanup, and optionally Validate. That is the entire contract. Every block, whether it handles JWT authentication or routes LLM calls across four providers, implements the same interface.

Blocks do not know about each other. The Auth block does not know the RateLimit block exists. The CircuitBreaker block does not care what comes after it. Each block gets the request context, does its work, and either passes it forward or short circuits with a response.

The pipeline itself is configured per route in YAML. Not in code. You look at the route config and you can see exactly which blocks run and in what order. No tracing through function calls. No guessing.

A route might look like this

One API route gets Auth, RateLimit, and AIProxy. Another route on the same gateway gets Auth, mTLS, and Transform. A health check route gets nothing at all. The blocks are the same. The composition changes.

In a traditional middleware chain, every request goes through every layer. You end up with if-statements inside middleware to skip logic for certain paths. That is where bugs hide. In the block pipeline, each route declares exactly what it needs. Nothing more runs.

The interface is small on purpose

I tried a version early on where the block interface had eight methods. Separate hooks for pre-request, post-request, error handling, metrics, health checks. It was thorough and nobody could implement a new block without reading a manual first.

I cut it down to four. Init loads configuration. Execute handles the request. Cleanup releases resources. Validate checks the block config at startup so you find bad configuration before the first request arrives, not after.

Four methods. That is the tax you pay to add a new concern to the gateway. I have had engineers write a new block in under an hour, including tests.

Short circuiting saves real money

Order in the pipeline is explicit and it matters for a practical reason. The RateLimit block runs before Auth in some of our configurations. That sounds wrong at first. Why would you rate limit before you know who the caller is?

Because if someone is flooding the endpoint with garbage, you want to reject those requests before you spend CPU on JWT validation, before you hit the database to look up their permissions, before you run any of the expensive downstream blocks. The RateLimit block returns a 429 and the request never reaches Auth. On a busy day, this saves real compute.

Short circuiting is built into the pattern. Any block can stop the pipeline and return a response. The CircuitBreaker block does this when a downstream service is failing. The SizeLimit block does it when someone sends a 50MB payload. The Security block does it when request headers look wrong. Each block decides for itself whether the request should continue.

The AIProxy block

Most of the blocks are things you would find in any API gateway. Auth, CORS, rate limiting, retries. Standard stuff. The one that is specific to our platform is AIProxy.

AIProxy routes LLM calls across OpenAI, Anthropic, Google, and Mistral. It picks the provider based on the model requested, current availability, and cost. It counts tokens on both the request and response. It enforces a cost ceiling per caller per billing period. If a team has burned through 80% of their monthly budget, it can downgrade them to a cheaper model automatically. If they hit 100%, it blocks.

Because it is a block, it composes with everything else. A route with AIProxy also has Auth (so you know who is calling), RateLimit (so nobody can flood the LLM providers), and OpenTelemetry Export (so you get traces for every model call). Each concern is separate. Each is testable on its own. Together they handle the full lifecycle of an LLM request.

From the block's point of view, an LLM provider is just another backend. The AIProxy block does not care about CORS or authentication. It cares about token counting, model selection, and cost tracking. It does that one thing.

Configuration as data

The pipeline is defined in YAML. Each route lists its blocks and each block has its own config section. This is not a design choice I made for elegance. I made it because I needed to answer a question that kept coming up: "What middleware runs on this route?"

In a code based middleware chain, answering that question means reading the code. Sometimes it means reading three files. Sometimes the middleware is registered conditionally based on environment variables and you need to check those too.

With YAML config, you open one file, find the route, and read the block list. Done. When something goes wrong at 2am, this matters more than you think.

It also means non-engineers can audit the pipeline. Your security team can review which routes have Auth enabled. Your compliance team can check that every external route has the Security block. They read YAML, not Go.

Testing gets simple

Each block implements the same interface. So you test each block the same way. Create a request context, call Execute, check what came out. No need to set up the whole pipeline. No need to mock six other middleware layers.

For the full pipeline, you write integration tests that compose a few blocks and send a request through. But most bugs live in individual blocks, and those are unit tested in isolation. When something breaks, you know which block broke. That alone changed my debugging experience more than any other decision in this system.

Mistakes I made

I did not add the Validate method until six months in. Before that, a typo in the YAML config would only show up at runtime, usually at the worst possible time. Validate runs at startup and catches bad configuration before the first request arrives. Should have been there from the start.

I also let blocks share state in ways that were not obvious. The Transform block would set a header that the Auth block later read. Nobody documented this. It broke twice before I noticed the coupling. Now blocks communicate only through the request context's metadata map, and that map is logged. You can see exactly what each block wrote and when.

When to use this

If you have fewer than five middleware concerns, a simple chain is fine. This pattern adds overhead that is not worth it for small systems.

If you have ten or more concerns, and especially if different routes need different combinations, the block pipeline saves you from a mess that only gets worse over time. The overhead of the interface and the YAML config pays for itself the first time someone asks "why is this route behaving differently" and you can answer by reading a config file instead of stepping through code.

Fifteen blocks in, I still add new ones in under an hour. The pipeline is still readable. I will take that.