Rebuilding a 130-Country Platform While the Lights Stayed On

Rebuilding a 130-Country Platform While the Lights Stayed On

In early 2021 I inherited a monolith. 220 projects stitched together, 30,000 sequential tests, builds that took seven hours when they worked and failed roughly half the time. This was the codebase running Fulfilled-by-Maersk, warehouse management, order routing, last-mile delivery, across 130 countries. Revenue-critical. Always on. 99.9% availability SLA, and the business meant it.

The post-pandemic unwinding of global trade was still playing out. Supply chains were disrupted in ways nobody had modelled for. Freight rates were swinging wildly. Customers were shifting volumes between carriers on shorter notice than ever. This was the backdrop against which someone had to look at a seven-hour build pipeline and say: we are going to rebuild this, and nothing goes down while we do it.

The decision nobody wanted to make

We had two options. A big-bang migration, freeze features, rewrite everything, flip a switch on launch day and hope. Or the harder thing: rebuild the same codebase, in place, while the platform kept running in production across every country it served.

We chose the second option. I say "we" because the engineering leadership across Denmark, India, the US and the UK made this call together, and they deserve the credit for it. More than 200 engineers were going to have to live inside this decision for years. It had to be theirs.

Big-bang rewrites look clean on a slide. You get to start clean. You get to pretend the old system's problems were someone else's fault. But we had a fulfilment platform processing real freight for real customers in 130 countries, and a freeze was not an option the commercial side of the business could absorb. Supply chain disruption was already costing everyone money. Adding a technology freeze on top of that was not a conversation I was willing to have.

So we committed to the messier path: decompose the monolith into event-driven microservices while the lights stayed on. Every release had to leave the system no worse than we found it. Every migration step had to be reversible. We called it "rebuilding the ship at sea," which is a cliché, but it was also literally true, Maersk is a shipping company.

What 7 hours to 30 minutes actually looks like

The numbers are easy to write down. Build times went from seven hours to thirty minutes. Build success rate moved from 50% to 95% across more than 200 developers. Release velocity improved by roughly 70%.

What the numbers don't capture is the daily grind of getting there. When your build takes seven hours, developers stop trusting it. They batch changes. They skip the pipeline and push directly. They develop workarounds that become their own technical debt. The build isn't just slow. It has rotted the habits of everyone who touches it.

Fixing the pipeline meant fixing the culture at the same time. We broke the 30,000 sequential tests into parallelisable suites. We decomposed the 220-project monolith into bounded services with independent build and deploy cycles. We put contract testing at the service boundaries so teams could release without coordinating with every other team on the platform.

None of this is novel computer science. The hard part was doing it without a single day of downtime on a platform that 130 countries depended on. Every decomposition step had a rollback plan. Every new service ran in shadow mode before it took live traffic. We maintained dual-write paths during migration windows that sometimes lasted weeks. It was tedious, careful, unglamorous work, and it is the thing I am most proud of from those four years.

The 95% build success rate matters more than the speed improvement, by the way. When half your builds fail, you have 200 engineers spending a meaningful chunk of their week fighting the toolchain instead of building product. When 95% pass, that time comes back. What you do with that recovered time matters more than the pipeline itself.

What the rebuild funded

The developer productivity we got back by fixing the platform didn't disappear. We pointed it somewhere specific: an enterprise data and AI platform.

We reduced data access latency by roughly 65% across 12 governed self-service domains. That sounds abstract until you consider what it meant in practice: business teams in different countries could get answers from their own data without filing a ticket and waiting three weeks for an analyst to pull a report.

On top of that data layer, we shipped production GenAI capability. RAG pipelines. Vector retrieval. Real workloads, not a demo. Support ticket volume dropped by about 40% across enterprise knowledge search. We built it with human-in-the-loop governance from day one because I had sat on Maersk's AI Ethics Committee and Technology Risk Committee, and I knew what the scrutiny would look like if we didn't.

This is the part of the story I keep coming back to. The AI work was only possible because we had done the boring, painful infrastructure work first. You cannot run retrieval-augmented generation on a platform where the builds fail half the time and data access takes weeks. The monolith rebuild and the AI platform are not two separate achievements. The first funded the second.

The portfolio underneath

I ran a $200M transformation portfolio with $50M in annual investment across those four years. Warehouse management, customs brokerage, last-mile delivery. The engineering organisation spanned four countries and more than 200 engineers at various points.

I am careful about the word "led" because at that scale you are not leading in any hands-on sense. You are setting direction, building the teams, hiring the right senior engineers and managers, creating the conditions for good decisions to happen without you in the room, and then getting out of the way. The people who actually did the work, the principal engineers who designed the service boundaries, the SREs who kept the platform up during migration, the QA leads who rebuilt the test architecture, they did the leading. I held the budget and tried not to make things worse.

What I measure everything else against

I have been in technology for 25 years across consulting, advisory, and operating roles. I have run a $5M P&L. I have done due diligence on billion-pound M&A transactions. None of it taught me as much as those four years at Maersk.

Consulting teaches you to diagnose. Operating teaches you to live with your diagnosis. When you are rebuilding a platform that processes real commerce for real businesses in 130 countries, you cannot hand over a slide deck and leave. You own the outcome. If the migration breaks something at 2am in a warehouse in Jakarta, that is your phone ringing.

The thing I took away from Maersk is that the most valuable technology work is usually the least interesting to talk about. Nobody writes conference abstracts about fixing build pipelines. Nobody gets on stage to describe a dual-write migration path that lasted six weeks. But those are the things that actually mattered: the engineers could ship with confidence, the business teams could access their own data, and the customers never noticed a thing because the lights stayed on the entire time.

That last part matters most. If you did it right, nobody outside engineering ever knew it was happening. The 130 countries kept running. The freight kept moving. Underneath, everything changed.