Why CVSS Scores Miss Real AI Security Risk
AI
financial services
August 03, 2026· 6 min read

Why CVSS Scores Miss Real AI Security Risk

CVSS severity ratings ignore context. A moderate flaw in AI decision systems poses greater risk than critical vulnerabilities in unused systems—materiality, not size, determines actual exposure.

Your Security Dashboard Is Lying to You (And Your Auditors Already Know Why)

I'm in a vulnerability review with a client last Tuesday. The dashboard is a wall of red. Eleven "criticals," everyone locked on the scores like they're watching a countdown timer. The CISO wants to talk remediation prioritization. The compliance team wants to document their response. Everyone's staring at the same numbers.

The bug that could actually hurt them was sitting in yellow.

Not theoretically vulnerable — actually exposed, in a customer-facing AI model that was approving wire transfer limits. A moderate-severity prompt injection flaw that CVSS scored a 6.8. It sat on page three of the report while we burned thirty minutes discussing a critical SQL injection vulnerability in a test environment that hadn't seen production traffic in two years.

This is the expensive gap between "critical" and "material" — and if you run an audit or risk function, you've seen this movie before.

We've Been Here Before: When Size Stopped Meaning Impact

Twenty years ago, accounting had the same problem. Junior auditors would surface large-dollar discrepancies and expect escalation. Partners would ask a different question: "Which account? What's the impact on the financials?"

A $500 error in revenue recognition beats a $5 million one that nets to zero across offsetting accounts. The profession learned to distinguish between size and consequence. We codified it. We called it materiality.

Security spent those same twenty years sorting vulnerabilities by size.

CVSS — the Common Vulnerability Scoring System that every security team triages by — is good at exactly one thing: telling you how bad a flaw is in a vacuum. Maximum theoretical impact, assuming perfect conditions, no context about where it lives or whether that location matters. It's a damage assessment with no crime scene.

That made sense when most software sat behind a firewall and "critical" meant "attacker gets root." The environment was simpler. The blast radius was clearer. A 9.8 really was worse than a 6.2.

Then we put AI inside the decision loop.

The AIVEX Wake-Up Call: Context Is the New Severity

This week SecurityWeek ran a piece on AIVEX, a proposed AI vulnerability triage model from independent researcher Devashri Datta. A couple of vendors are already building it into their platforms. The model itself will evolve — that's not the story.

The idea underneath it is the part worth your attention: a moderate flaw in an AI model that's making decisions can carry more real risk than a critical in a system nobody depends on.

Read that sentence twice if you're signing off on risk registers.

Because the math didn't change. The environment did. When an LLM is triaging insurance claims, approving refunds, or routing support tickets, a prompt injection vulnerability isn't just a data leak — it's a business logic bypass. The severity score says "moderate." The financial impact says "we just paid 10,000 fraudulent claims."

Your risk framework wasn't built for this.

What Your Dashboard Measures vs. What Your Board Cares About

Here's the uncomfortable question: How many "critical" findings on your current dashboard actually affect a system that touches revenue, compliance obligations, or customer trust?

I've run this exercise with a dozen clients in the last six months. The pattern is consistent:

  • 60–70% of "critical" and "high" findings live in development environments, legacy systems with no external access, or applications that haven't seen active use in quarters

  • The findings that map to actual business risk — the customer portal with the auth bypass, the AI model with the jailbreak vector, the API that handles PCI data — are scattered across severity bands

One client had eighteen criticals. Three were in production. One of those three was material.

Your dashboard is sorted by severity. Your actual risk isn't.

The tooling isn't wrong. It's answering a question from 2015: "How bad could this be?" What the CFO and audit committee want to know in 2025 is different: "If this breaks, what business process fails?"

The Accountant's Playbook: Three Questions Security Should Steal

Every auditor learns to ask three questions when they see a discrepancy:

  1. What account is it in? (Context matters more than size)

  2. Does it affect the financials users rely on? (Materiality is about decisions, not dollars)

  3. Can it cascade? (One $500 error can break reconciliation across twelve downstream accounts)

The security equivalent for vulnerabilities in AI systems:

  1. Where does this flaw live in the decision chain? A critical in a sandbox is theater. A moderate in production AI is material.

  2. What business process depends on the output being correct? If the answer is "pricing," "approvals," or "routing," severity scores are the wrong lens.

  3. What's the blast radius if the model is compromised? One poisoned prompt can corrupt training data. One jailbreak can bypass eighteen months of safety tuning.

Accountants settled this decades ago. Materiality was never about size. It's about consequence.

What to Do Monday Morning

If you're a CISO, CFO, or audit lead, here's the specific ask:

Pull your current vulnerability report. Identify one "moderate" or "low" finding that lives inside a system making automated decisions — approvals, routing, pricing, access grants. Ask your team: if an attacker exploited this tomorrow, what business process breaks?

If the answer is uncomfortable, you've found your actual material risk.

Then ask the harder question: How many of your "criticals" would anyone outside the security team actually notice if they were exploited?

That gap — between what your tools flag and what your business depends on — is where the next breach is hiding. CVSS won't find it. Your auditors will.

The Bottom Line: Critical and Material Used to Be the Same Word

For twenty years, "critical severity" and "material risk" were close enough that we could treat them as synonyms. The most dangerous vulnerabilities usually sat in the most important systems. Triage by score worked.

Once AI is inside the decision, they're not the same anymore.

A critical RCE in a test environment is a cleanup ticket. A moderate prompt injection in a production AI model approving transactions is a board-level risk event. The CVSS score doesn't know the difference. Your risk framework needs to.

We've watched this pattern before. When trading moved from floor to electronic, "fast" and "reliable" stopped being the same thing. When cloud replaced on-prem, "secure" and "compliant" diverged. Every time the infrastructure changes, the old proxies for risk stop working.

The tooling will catch up — models like AIVEX are the leading edge. But you don't have to wait for vendors to ship context-aware scoring.

You already know how to do this. You learned it in Accounting 101.

Start triaging vulnerabilities the way you triage journal entries: not by size, but by which ledger they're in.

Your dashboard will still be red. But you'll finally be fixing the right things.


What's the moderate-severity finding in your environment that's actually material? If you can't name it, your triage process isn't wrong. It's just answering a different question than the one you're paid to answer.

Need Enterprise Solutions?

RSM provides comprehensive blockchain and digital asset services for businesses.

More Ai Posts

February 23, 2026

Why Solo AI Builders Are Your Market Canaries

Solo developers using AI are discovering pricing models and tools enterprises will demand in 2-3 years. Watch them to pr...

December 22, 2025

Stop Waiting for AI: Your Competition Already Started

AI disruption isn't coming tomorrow—it's happening now. While most companies debate, competitors are shipping. Here's wh...

January 08, 2026

AI Training Data Rights: The Legal Framework We're Missing

Authors suing AI companies will likely lose, but they're exposing a critical gap: no legal framework exists for compensa...