70% of Your Engineers Are Using AI Tools You Do Not Know About
70% of your engineers are using AI tools you do not know about
I found out on a Tuesday. One of my engineers had pasted a production database schema into ChatGPT. Table names, column names, foreign key relationships, comments describing business logic. He wanted help writing a migration script. He got help. He also sent our data model to a third party server in California.
He wasn't being careless. He was being productive. And that's the problem with shadow AI. It doesn't look like a security incident. It looks like someone doing their job faster.
The numbers are bad
About 70% of employees now use AI tools their employer hasn't approved (Microsoft/LinkedIn 2024 Work Trend Index). It's not fringe behaviour. It's the majority.
Cyberhaven's research found that 34.8% of the data employees paste into ChatGPT is confidential. Not "could be sensitive." Confidential. Source code, internal documents, customer data, financial figures.
IBM's 2024 Cost of a Data Breach report says shadow AI adds $200,000 to $670,000 to breach costs. That's the extra cost, on top of the breach itself.
None of this surprised me after the schema incident. My organisation was no different.
What I actually found when I looked
After the schema incident, I ran an audit. Not a formal one. I just started asking questions and looking at network traffic.
Engineering were the obvious users. ChatGPT, Claude, GitHub Copilot, assorted code generation tools. Some had personal accounts on company devices. One team had built a Slack bot that called the OpenAI API using a developer's personal key with no spend limits.
But engineering wasn't the worst of it. Marketing had fed competitor analysis into Claude. Finance had summarised board meeting notes in ChatGPT. HR, and this one really got to me, had pasted employee performance reviews into an AI tool to "help with tone."
Final count: 14 AI tools across 6 departments, none approved, none with data processing agreements, none logged.
Why I didn't ban anything
My first instinct was to block it all. Firewall rules. Update the acceptable use policy. Get the CISO to send a stern email. Standard playbook.
I've been in technology long enough to know that banning tools people find useful doesn't work. It pushes the behaviour somewhere you can't see it. People use their phones, bring home laptops, tether to personal hotspots. You end up playing whack-a-mole, and the moles are your best engineers.
So instead of banning, I decided to channel.
The AI gateway approach
The idea is simple. Instead of letting everyone talk to AI services directly, you put a gateway in the middle. All AI traffic routes through it. Think of it like a web proxy, but for LLM calls.
The main thing it does is data loss prevention. Every prompt gets scanned before it leaves the network. We defined patterns for things that should never reach an external AI: database schemas, API keys, customer PII, financial data, internal code with certain path prefixes. Match triggers a block, and the user sees a message explaining why.
It tracks token usage. Every API call is logged: who, which model, how many tokens, what time. Not the content of every prompt (that creates its own privacy problem), but enough metadata to spot anomalies. Someone sends 50,000 tokens to GPT-4 at 2am, we know.
Model routing too. Not every question needs the most capable model. Simple queries go to cheaper, faster models. This matters for cost when 200 engineers are all hitting API endpoints daily.
And audit logs. When compliance asks "who has been using AI and for what," the answer is a dashboard, not a shrug.
The governance framework in practice
The gateway handles technical controls. You also need rules people can follow.
Approved tools list
We maintain a short list of approved AI tools. Getting on requires a security review, a data processing agreement, and confirmation the tool doesn't train on our inputs. Currently 6 tools. Engineers use those. Nothing else for work.
Reviewed quarterly. Last quarter we dropped one because the vendor changed their data retention policy without telling us.
Data classification rules
We use four tiers. Public data can go to any approved tool. Internal data goes to approved tools only, not free-tier services. Confidential data needs enterprise agreements and must go through the gateway. Restricted data never goes to an external AI, no exceptions.
Every engineer went through a 30-minute training session with examples from our own codebase. Not a slide deck they clicked through. "This is fine to ask an AI about. This is not. Here's why."
Automated scanning
The DLP rules catch most things, but we also scan repository commits weekly. AI generated code has visible patterns: certain comment styles, naming conventions, sometimes model watermarks. We flag these for review, not punishment, to make sure the code meets our standards and doesn't include hallucinated dependencies.
None of this is new. I've spent years in information security governance (ISO 27001 Lead Auditor, ISACA CISM/CGEIT/CRISC exams passed). Data classification is ISO 27001. Risk based tool approval is CRISC. Audit logging is CISM incident response. We pointed old frameworks at new tools.
What changed after six months
AI tool usage went up, not down. When people have a sanctioned path, they use it freely. Prompt volume through the gateway tripled in the first three months.
DLP blocks averaged about 40 per week in month one. By month six, down to 8. People learned what they could and couldn't send. The training helped. The feedback loop from blocked prompts helped more.
We caught two genuine incidents through the audit logs. Both accidental. Both caught within hours, both contained before any data left the organisation. Without the gateway, we'd have found out months later, or never.
The developer who pasted the database schema? He's now one of the gateway's biggest advocates. His words: "I'd rather get a block message than find out three months later that I caused a breach."
What I'd tell another engineering leader
Your engineers are already using AI tools. You can choose to know about it or not. Choosing not to know means accepting risk you haven't measured, for tools you haven't evaluated, handling data you haven't classified.
Don't ban. Channel. Make the secure path the easy path. The gateway approach isn't perfect. But it moves you from "we have no idea" to "we have reasonable visibility."
The alternative is pretending shadow AI isn't happening. I've tried that too, briefly. It cost more.