AI Security in Production: What Guardrails Actually Look Like

AI security in production: what guardrails actually look like

"Guardrails" might be the most abused word in enterprise AI right now. Every vendor pitch, every conference talk, every LinkedIn thought leader drops it. We need guardrails. We've implemented guardrails. Our platform has guardrails built in. Ask what those guardrails actually are and the room goes quiet.

I've spent the past two years building an AI security framework across 20+ models in production. I sit on an AI Ethics Committee and a Technology Risk Committee. I hold ISO 27001 Lead Auditor certification and ISACA CISM, CGEIT, CRISC exams passed, which is a polite way of saying I've spent a lot of time arguing about risk in systems that don't behave deterministically. Most organisations saying "guardrails" mean "we told the developers to be careful."

What production AI security actually requires

If your security posture for production AI is "we review prompts manually before release," you're already behind. The threat surface is different from traditional software, and most AppSec teams haven't caught up yet.

Prompt injection testing belongs in CI/CD

Prompt injection is the SQL injection of the LLM era, except the attack surface is natural language, which makes it harder to scan for and harder to block.

We built prompt injection tests into our CI/CD pipeline. Every model endpoint gets a battery of adversarial inputs on every deploy -- direct injections, indirect injections via retrieved context, encoding tricks, multilingual payloads. If a model can be coerced into ignoring its system prompt, that deploy doesn't ship. It runs alongside unit tests. It's not a special occasion.

The tooling is still immature. We wrote about half of it ourselves. But waiting for the tooling to mature while running models in production is a choice, and choices have consequences.

Bias monitoring per demographic, not in aggregate

Bias detection that reports a single fairness score across your whole user base is worse than useless -- it's actively misleading. A model can look fair in aggregate and be systematically wrong for specific groups.

We monitor bias per demographic segment, per model, continuously. Not a one-off assessment before launch. The metrics feed into dashboards the Ethics Committee reviews monthly, and automated alerts fire when drift exceeds agreed thresholds.

This means defining your demographic segments explicitly, which most organisations would rather avoid. It means awkward conversations when the numbers look bad. Skip it and you're just hoping the model is fair. Hope is not a control.

Model cards are not optional documentation

Every model we deploy carries a model card. Not the academic kind with aspirational statements about intended use -- a working document that states what the model does, what data it was trained on, known failure modes, demographic performance splits, who owns it, when it was last evaluated, and what happens when it goes wrong.

No model card, no deployment. The card template is standardised. The review is human. The card travels with the model through every environment.

Sounds bureaucratic. It takes about two hours. And it has prevented at least three incidents where someone tried to repurpose a model for a use case it was never validated for.

Supply chain scanning for model weights

You scan your npm packages. You scan your container images. Are you scanning your model weights?

Model files can carry serialised code. Pickle deserialization attacks are well documented. We scan every model artifact before it enters our registry, same as containers. We track provenance. We verify checksums. If a model weight file comes from an unverifiable source, it doesn't get deployed.

Basic supply chain hygiene applied to a new artifact type. The fact that most organisations haven't done it says more about the field's maturity than the task's difficulty.

The AI Ethics Committee: what it actually does

I've seen plenty of cynicism about AI ethics boards, and some of it is earned. A committee that meets quarterly and produces position papers is theatre.

Ours meets monthly. Engineering leads, a data protection officer, someone from legal, a product manager, and an external adviser with a background in algorithmic accountability. Nobody on the committee has "AI Ethics" in their job title. Everyone has operational responsibilities elsewhere.

The committee makes binding decisions. It can block a deployment. It has. Last year it required a six-week delay on a recommendation model because bias testing showed unacceptable variance across age groups. That cost real money. It was the right call, and the fact that the committee had the authority to make it -- without escalation, without a three-week approval chain -- is the whole point.

Our decision framework runs on two axes: potential for harm and reversibility. High harm, low reversibility gets the most scrutiny. Low harm, high reversibility gets a lighter touch. Everything gets something.

The regulatory picture

The EU AI Act is now in force. If you're deploying high-risk AI systems and haven't started your conformity assessment, you're late. The Act requires risk management systems, data governance, transparency obligations, and human oversight. These are not suggestions.

The UK is taking a different path. The AI Cyber Security Code of Practice, published by DSIT in late 2024, is principles-based rather than prescriptive, but the expectations are clear: secure design, secure development, secure deployment, secure maintenance. It covers the full lifecycle and explicitly addresses supply chain risks, which most frameworks still treat as someone else's problem.

Neither framework tells you exactly what to build. Read them carefully, though, and the shape is obvious: continuous monitoring, documented risk assessment, bias testing, incident response procedures written specifically for AI failures, and accountability that traces to named individuals. Not to "the team."

Starting from zero

If I were advising a CTO with no AI security posture today, I wouldn't say "build a framework." I'd say do these things, roughly in this order, and do them soon.

Put prompt injection tests in your CI/CD pipeline. Garak is a reasonable open source starting point. Run it on every deploy. Then pick your three highest-risk models and write model cards for them. Make the cards a deployment gate -- no card, no deploy. Set up bias monitoring with demographic splits for anything that touches end users, reviewed monthly at minimum. Scan your model artifacts the way you already scan your dependencies -- check provenance, verify integrity. And get four or five people with operational authority into a room once a month. Give them the power to block a deployment. Let them use it.

None of this requires a massive budget. It requires deciding that AI security is an engineering discipline, not a compliance exercise. That distinction matters more than any framework you could buy.