1. The AI Attack Surface

Introduction

In Chapter 1, you explored the remarkable capabilities of modern AI systems – from how LLMs process language to how agentic AI systems autonomously execute complex tasks. Every one of those capabilities – prompt processing, RAG retrieval, tool use, code execution, model fine-tuning – represents a surface that attackers can target.

This section maps the complete AI attack surface. You’ll learn the frameworks security professionals use to categorize and communicate about AI threats, understand who the threat actors are, and build a mental model that connects every vulnerability covered in the rest of this chapter back to a specific stage of the AI lifecycle.

Think of this section as your attack vocabulary. By the end, you’ll be able to walk into any conversation about AI security and speak the language.

What will I get out of this?

By the end of this section, you will be able to:

  1. Map the AI lifecycle as an attack surface, identifying where vulnerabilities exist at each stage from training through inference.
  2. Categorize threat actors targeting AI systems and explain their motivations, capabilities, and typical attack patterns.
  3. Describe all 10 OWASP LLM Top 10 (2025) categories with real-world context and explain why each matters for AI deployments.
  4. Preview the OWASP Agentic AI Top 10 (2026) categories and explain why agentic systems require their own threat framework.
  5. Reference MITRE ATLAS as a supplementary framework for AI threat modeling.
  6. Trace attack vectors across the AI lifecycle using visual diagrams.
  7. Cite the Microsoft LLMjacking case as an example of how AI infrastructure attacks translate to real financial damage.

The AI Lifecycle as an Attack Surface

Every stage of an AI system’s lifecycle – from initial training to production inference and ongoing monitoring – presents opportunities for attackers. The diagram below maps these stages and highlights where the most significant attack vectors exist.

graph LR
    subgraph "Training Phase"
        A["Data Collection<br/>& Curation"]
        B["Model Training<br/>& Pre-training"]
    end

    subgraph "Customization Phase"
        C["Fine-tuning<br/>& Alignment"]
        D["RAG Pipeline<br/>Setup"]
    end

    subgraph "Deployment Phase"
        E["Model Packaging<br/>& Distribution"]
        F["Infrastructure<br/>& API Setup"]
    end

    subgraph "Inference Phase"
        G["Prompt Processing<br/>& Generation"]
        H["Tool Use &<br/>Agentic Actions"]
    end

    subgraph "Monitoring Phase"
        I["Logging &<br/>Feedback Loops"]
    end

    A --> B --> C --> D --> E --> F --> G --> H --> I
    I -.->|"feedback"| A

    A ---|"Data Poisoning<br/>LLM04"| PA["Attack Vector"]
    B ---|"Backdoor Insertion<br/>LLM04"| PB["Attack Vector"]
    C ---|"LoRA Adapter Attacks<br/>LLM03"| PC["Attack Vector"]
    D ---|"RAG Poisoning<br/>LLM08"| PD["Attack Vector"]
    E ---|"Supply Chain<br/>LLM03"| PE["Attack Vector"]
    F ---|"Infrastructure Exploits<br/>LLM10"| PF["Attack Vector"]
    G ---|"Prompt Injection<br/>LLM01"| PG["Attack Vector"]
    H ---|"Excessive Agency<br/>LLM06"| PH["Attack Vector"]
    I ---|"Memory Poisoning<br/>LLM01"| PI["Attack Vector"]

    style PA fill:#8b0000,color:#fff
    style PB fill:#8b0000,color:#fff
    style PC fill:#8b0000,color:#fff
    style PD fill:#8b0000,color:#fff
    style PE fill:#8b0000,color:#fff
    style PF fill:#8b0000,color:#fff
    style PG fill:#8b0000,color:#fff
    style PH fill:#8b0000,color:#fff
    style PI fill:#8b0000,color:#fff

The key insight: there is no safe stage. Attackers target training data months before a model ships. They poison RAG corpora without touching the model itself. They craft prompts that override system instructions in real time. They steal models through inference APIs. Each section of this chapter maps to one or more of these lifecycle stages.


Who Are the Threat Actors?

Not all attackers are the same. Understanding who targets AI systems helps you assess risk and prioritize defenses.

Threat Actor Motivation Typical Targets Capability Level
Script Kiddies Curiosity, bragging rights Public chatbots, open APIs Low – use published jailbreaks and prompt injection templates
Competitors IP theft, market advantage Proprietary models, training data Medium – targeted API extraction, employee social engineering
Cybercriminals Financial gain API credits, compute resources, customer data Medium to High – LLMjacking, credential theft, ransomware
Nation-States Intelligence, disruption Critical infrastructure AI, defense models Very High – supply chain attacks, insider recruitment, zero-days
Malicious Insiders Revenge, financial gain, ideology Training pipelines, model weights, internal tools High – already have access, can bypass perimeter controls
Security Researchers Responsible disclosure, reputation Any system with a bug bounty or academic interest Varies – often discover novel techniques first
The Democratization Problem

AI attack tools are becoming increasingly accessible. Jailbreak prompts spread on social media within hours. Prompt injection templates are shared on GitHub. The barrier to entry for attacking AI systems is far lower than for traditional cybersecurity exploits – you don’t need to write code to manipulate a chatbot.


OWASP LLM Top 10 (2025)

The OWASP Top 10 for LLM Applications is the industry-standard framework for categorizing LLM vulnerabilities. The 2025 edition reflects the current threat landscape, including risks specific to RAG systems, agentic AI, and model supply chains.

This is your reference framework for the rest of the chapter. Every attack technique covered in Sections 2-7 maps back to one or more of these categories.

graph TB
    subgraph "Input Attacks"
        LLM01["LLM01<br/>Prompt Injection"]
        LLM07["LLM07<br/>System Prompt Leakage"]
    end

    subgraph "Data & Training"
        LLM04["LLM04<br/>Data and Model Poisoning"]
        LLM03["LLM03<br/>Supply Chain"]
        LLM08["LLM08<br/>Vector and Embedding<br/>Weaknesses"]
    end

    subgraph "Output & Behavior"
        LLM02["LLM02<br/>Sensitive Information<br/>Disclosure"]
        LLM05["LLM05<br/>Improper Output<br/>Handling"]
        LLM09["LLM09<br/>Misinformation"]
    end

    subgraph "System & Operations"
        LLM06["LLM06<br/>Excessive Agency"]
        LLM10["LLM10<br/>Unbounded Consumption"]
    end

    style LLM01 fill:#8b0000,color:#fff
    style LLM02 fill:#8b0000,color:#fff
    style LLM03 fill:#8b0000,color:#fff
    style LLM04 fill:#8b0000,color:#fff
    style LLM05 fill:#8b0000,color:#fff
    style LLM06 fill:#8b0000,color:#fff
    style LLM07 fill:#8b0000,color:#fff
    style LLM08 fill:#8b0000,color:#fff
    style LLM09 fill:#8b0000,color:#fff
    style LLM10 fill:#8b0000,color:#fff

LLM01: Prompt Injection

Prompt injection occurs when an attacker crafts input that overrides or manipulates the LLM’s intended instructions. This is the most well-known and widely exploited LLM vulnerability.

  • Direct injection: Attacker types malicious instructions directly into the chat interface
  • Indirect injection: Malicious instructions are hidden in data the LLM processes (documents, web pages, emails)
  • Why it matters: Can bypass safety guardrails, extract sensitive data, or make the LLM take unauthorized actions
  • Covered in depth: Section 2

LLM07: System Prompt Leakage

System prompt leakage occurs when an attacker extracts the hidden system instructions that define an LLM application’s behavior, personality, or access controls.

  • Technique: Crafted prompts that trick the model into revealing its system-level configuration
  • Why it matters: Leaked system prompts reveal business logic, API keys embedded in instructions, and guardrail configurations – giving attackers a roadmap for further exploitation
  • Covered in depth: Section 2

LLM04: Data and Model Poisoning

Data poisoning targets the training pipeline – corrupting the data used to train, fine-tune, or align an LLM to introduce biases, backdoors, or vulnerabilities.

  • Scope: Includes training data manipulation, fine-tuning attacks, and RLHF poisoning
  • Why it matters: Poisoned models produce subtly wrong or dangerous outputs that are extremely difficult to detect
  • Covered in depth: Section 3

LLM03: Supply Chain

Supply chain vulnerabilities arise from compromised components in the AI development and deployment pipeline – model weights, training datasets, pre-built adapters, and software dependencies.

  • Scope: Malicious models on hubs (Hugging Face, PyPI), poisoned fine-tuning adapters, compromised dependencies
  • Why it matters: One poisoned model or package can compromise every deployment that uses it
  • Covered in depth: Sections 3 and 4

LLM08: Vector and Embedding Weaknesses

Vector and embedding weaknesses exploit vulnerabilities in how RAG systems store and retrieve information – including lack of access controls on vector stores, manipulation of embedding spaces, and adversarial documents that game retrieval algorithms.

  • Scope: RAG poisoning, embedding manipulation, unauthorized vector store access
  • Why it matters: RAG is one of the most deployed patterns in enterprise AI – weaknesses here affect millions of applications
  • Covered in depth: Section 3

LLM02: Sensitive Information Disclosure

Sensitive information disclosure occurs when an LLM reveals confidential data in its responses – training data, PII, API keys, internal system details, or proprietary information.

  • Scope: Training data extraction, PII leakage through memorization, system information disclosure
  • Why it matters: LLMs can memorize and reproduce sensitive training data, and improper filtering allows this data to reach end users
  • Covered in depth: Section 6

LLM05: Improper Output Handling

Improper output handling occurs when LLM outputs are passed to downstream systems without adequate validation or sanitization – enabling traditional attack vectors like XSS, SQL injection, and command injection through AI-generated content.

  • Scope: Code injection via generated output, unvalidated function calls, privilege escalation through crafted responses
  • Why it matters: The LLM becomes an attack vector against your own backend systems
  • Covered in depth: Sections 4 and 6

LLM09: Misinformation

Misinformation occurs when LLMs generate false or misleading content with high confidence – hallucinations presented as fact, fabricated citations, or confident responses to questions outside the model’s knowledge.

  • Scope: Hallucinations, fabricated references, confident but wrong answers, overreliance on model outputs
  • Why it matters: In high-stakes domains (medical, legal, financial), misinformation can cause real harm
  • Covered in depth: Section 6

LLM06: Excessive Agency

Excessive agency occurs when an LLM-based system is granted more capabilities, permissions, or autonomy than necessary – and an attacker exploits that excess to take unauthorized actions.

  • Scope: Over-permissioned tool access, unrestricted function calling, autonomous actions without human oversight
  • Why it matters: In agentic AI systems, excessive agency can mean the difference between a failed prompt injection and a full system compromise
  • Covered in depth: Section 5

LLM10: Unbounded Consumption

Unbounded consumption occurs when attackers exploit LLM resource usage to cause denial of service, financial damage through excessive API costs, or resource exhaustion.

  • Scope: Prompt flooding, context window stuffing, recursive generation attacks, API cost exploitation
  • Why it matters: A single attacker can run up thousands of dollars in API costs or take down a production service
  • Covered in depth: Section 4

MITRE ATLAS: A Supplementary Framework

While OWASP provides the vulnerability categories, MITRE ATLAS (Adversarial Threat Landscape for AI Systems) offers a complementary framework that maps adversarial tactics and techniques specifically for AI and machine learning systems.

ATLAS currently catalogs 15 tactics and 66 techniques organized in a matrix similar to the MITRE ATT&CK framework familiar to cybersecurity professionals. It covers the full adversarial lifecycle from reconnaissance to impact.

When to Use Which Framework
  • OWASP LLM Top 10: For communicating about LLM-specific vulnerabilities with stakeholders, compliance teams, and developers. It answers: “What can go wrong?”
  • MITRE ATLAS: For detailed threat modeling and red team planning. It answers: “How does an attacker get from initial access to impact, step by step?”

You don’t need to memorize all 66 ATLAS techniques. Know that the framework exists, understand it maps the attacker’s journey, and reference it when doing formal threat modeling. OWASP categories are your everyday vocabulary.


OWASP Agentic AI Top 10 (2026)

The rise of agentic AI systems – AI that uses tools, makes decisions, and acts autonomously – created attack surfaces so different from traditional LLM usage that OWASP published a dedicated framework in 2026. Where the LLM Top 10 covers what can go wrong with model inputs, outputs, and data, the Agentic AI Top 10 covers what can go wrong when AI systems take actions in the real world.

In Chapter 1 Section 7, you learned about the agent loop: plan, select tool, execute, observe, decide. Each stage of that loop is a target, and the Agentic AI Top 10 maps those targets.

These categories will be covered in depth in Section 5: Agentic Attack Vectors. Here’s the preview:

WarningASI01: Agent Goal Hijacking

Manipulating the goals or objectives an agent is pursuing – redirecting its decision-making toward attacker-chosen outcomes rather than its intended purpose.


WarningASI02: Tool Misuse and Exploitation

Exploiting the tools and functions available to an agent – tricking it into calling tools in unintended ways, passing malicious parameters, or using legitimate tools for unauthorized purposes.


WarningASI03: Identity and Privilege Abuse

Leveraging the agent’s identity and access permissions to perform actions the human user shouldn’t be able to do – using the agent as a privilege escalation vector.


WarningASI04: Agentic Supply Chain Vulnerabilities

Compromising the plugins, tools, APIs, or external services that agents depend on – poisoning the agent’s operational environment rather than the agent itself.


WarningASI05: Unexpected Code Execution

Exploiting an agent’s ability to generate and execute code – injecting malicious code through prompt manipulation or data poisoning that the agent then runs with its own permissions.

WarningASI06: Memory and Context Poisoning

Manipulating an agent’s persistent memory or context to influence future decisions – planting false information that the agent will trust and act upon in subsequent interactions.


WarningASI07: Insecure Inter-Agent Communication

Exploiting the communication channels between agents in multi-agent systems – intercepting, modifying, or injecting messages to manipulate collaborative agent workflows.


WarningASI08: Cascading Failures

Triggering failures that propagate through interconnected agent systems – exploiting the interdependence of multi-agent architectures to amplify the impact of a single point of compromise.


WarningASI09: Human-Agent Trust Exploitation

Exploiting the trust relationship between humans and AI agents – using the agent’s perceived authority or helpfulness to manipulate human decisions or bypass human oversight.


WarningASI10: Rogue Agents

Agents that deviate from their intended behavior – whether through goal drift, compromised instructions, or adversarial manipulation – acting against the interests of their operators.

Why a Separate Framework?

Traditional LLM vulnerabilities (prompt injection, data poisoning) still apply to agentic systems – but agentic systems add entirely new risk categories around tool use, autonomous action, multi-agent coordination, and persistent memory. A chatbot that gets prompt-injected can say something wrong. An agent that gets prompt-injected can do something wrong – run code, send emails, modify files, or purchase services.


Case Study: Microsoft LLMjacking Lawsuit (2024)

Real-World Impact: LLMjacking at Scale

Who: Microsoft, targeting a cybercriminal ring

When: 2024

What happened: Attackers stole Azure OpenAI API credentials and resold access to generate harmful content at scale. The operation hijacked legitimate accounts’ compute resources, racking up costs exceeding $100,000 per day for the victims. Microsoft filed a lawsuit under the Computer Fraud and Abuse Act.

How it worked:

  1. Attackers harvested API keys from public code repositories, leaked credentials, and social engineering
  2. They built proxy services that resold access to GPT-4 and DALL-E at discounted rates
  3. Buyers used the stolen access to generate content that violated model usage policies
  4. Victims discovered the attack only when unexpected charges appeared on their bills

OWASP mapping: LLM10: Unbounded Consumption – stolen credentials enabled unlimited resource consumption at the victim’s expense.

Lesson: API credential security isn’t just about protecting your data – it’s about protecting your budget. A single leaked API key can become a liability of thousands of dollars per day.

Key Takeaways
  • The OWASP LLM Top 10 (2025) is the industry-standard framework for categorizing LLM vulnerabilities across input, data, output, and system risks.
  • Every stage of the AI lifecycle – from training data collection through production inference and monitoring – presents distinct attack surfaces.
  • Threat actors range from script kiddies using published jailbreaks to nation-states conducting supply chain attacks, each with different capabilities and motivations.
  • The OWASP Agentic AI Top 10 (2026) extends the threat model to cover autonomous AI systems that can take real-world actions.
  • MITRE ATLAS complements OWASP by mapping the attacker’s step-by-step journey from reconnaissance to impact.

Test Your Knowledge

Ready to test your understanding of the AI attack surface and threat frameworks? Head to the quiz to see how well you can map attack vectors to the AI lifecycle.


Up next

Now that you have the full attack surface map and understand the OWASP framework, it’s time to dive into the most common and well-understood attack category: prompt injection. In the next section, you’ll see exactly how attackers manipulate LLM inputs – with sanitized examples you can study and discuss.