4. Model and Infrastructure Attacks

Introduction

A healthcare startup deploys a self-hosted diagnostic AI model on their internal servers – a deliberate choice for data sovereignty and compliance. They download a popular open-source model from a well-known hub, load it into their inference server, and begin processing patient data. Three weeks later, their security team detects unusual outbound network traffic from the inference server. Investigation reveals that the model file contained a serialization exploit: when the model was loaded, it silently established a reverse shell to an attacker-controlled server. The attacker has had access to the model’s runtime environment – and potentially patient data – for three weeks.

Self-hosted deployments give organizations control over their data and models. They also give organizations responsibility for every layer of security in the stack – from model files to GPU memory to API endpoints. In this section, you’ll learn how attackers target the infrastructure that runs AI systems.

What will I get out of this?

By the end of this section, you will be able to:

Explain serialization exploits and why loading a model file can lead to remote code execution.
Describe adversarial input attacks at a conceptual level, including perturbation techniques.
Identify model theft techniques including API-based extraction attacks.
Explain denial of service attacks on LLMs through unbounded consumption.
Catalog infrastructure attack vectors including container escapes, GPU memory leaks, and API key theft.
Reference the Meta Llama CVE-2024-50050 as a serialization exploit example.
Reference the Ultralytics supply chain attack (Dec 2024) as a model ecosystem supply chain compromise.

Serialization Exploits LLM03: Supply Chain

Serialization exploits are among the most critical infrastructure risks in AI deployments. They target the process of loading model files – turning what appears to be a routine operation into a full system compromise.

The Pickle Problem (In Depth)

Python’s pickle module is the default serialization format for many ML frameworks. When you load a pickled file, Python executes arbitrary code embedded in the file. This isn’t a bug – it’s how pickle works by design. It was never intended for loading untrusted data.

The problem: much of the AI ecosystem was built around pickle before security was a primary concern. PyTorch model files (.pt, .bin), training checkpoints, and many custom model formats all use pickle serialization under the hood.

How Serialization RCE Works

graph TB
    A["Attacker crafts<br/>malicious model file"] -->|"Embeds Python code<br/>in pickle payload"| B["Model file on<br/>hub / download site"]
    B -->|"User downloads<br/>'torch.load(model.pt)'"| C["Python deserializes<br/>pickle objects"]
    C -->|"Pickle executes<br/>embedded code"| D["Remote Code Execution"]
    D --> E["Reverse shell,<br/>credential theft,<br/>cryptominer, etc."]

    style A fill:#8b0000,color:#fff
    style D fill:#8b0000,color:#fff
    style E fill:#8b0000,color:#fff

The Safetensors Alternative

The safetensors format was developed specifically to solve this problem. It stores tensor data in a simple, flat format that cannot execute code during loading. Major hubs like Hugging Face now default to safetensors format and flag models that use pickle serialization.

However, safetensors is not universal. Many older models, checkpoints, and custom frameworks still rely on pickle. Organizations must treat any non-safetensors model file as potentially executable code.

Adversarial Inputs LLM05: Improper Output Handling

Adversarial inputs are carefully crafted perturbations to model inputs that cause the model to produce incorrect outputs. While most prominent in computer vision (a few pixel changes can make an image classifier misidentify a stop sign), adversarial inputs also affect LLMs and multimodal models.

Conceptual Overview

The core idea: machine learning models learn decision boundaries in high-dimensional spaces. Adversarial inputs are designed to push an input across a decision boundary with minimal perceptible change. The change is imperceptible to humans but significant to the model.

For LLMs, adversarial inputs include:

Token-level perturbations: Replacing words with synonyms or near-synonyms that cause different model behavior
Typographical attacks: Strategic misspellings that bypass content filters while preserving semantic meaning
Multimodal attacks: Embedding adversarial signals in images or audio that are invisible/inaudible to humans but influence the model’s text generation

Practical Impact

In production systems, adversarial inputs are most dangerous when combined with other attack vectors. For example, adversarial perturbations can be used to craft prompt injection payloads that bypass input filters – the filter doesn’t detect the malicious intent, but the model still processes it.

Model Theft LLM03: Supply Chain

Model theft – also known as model extraction – is the process of stealing a proprietary model’s behavior, weights, or architecture through its API. The attacker doesn’t need access to the model files. They reconstruct the model by observing its inputs and outputs.

Extraction Through API Queries

The basic approach:

Query the target model with a large, strategically chosen set of inputs
Record the outputs (including probability distributions if the API exposes them)
Train a local model to reproduce the target model’s behavior using the collected input-output pairs
The resulting “distilled” model approximates the target’s capabilities without paying for the original model’s training costs

sequenceDiagram
    participant Attacker
    participant Target API as Target Model API
    participant Local as Local Training Env
    participant Clone as Cloned Model

    Attacker->>Target API: 1. Send strategic query inputs
    Target API-->>Attacker: 2. Return outputs + probabilities
    Attacker->>Attacker: 3. Repeat with thousands of queries
    Attacker->>Local: 4. Feed input-output pairs as training data
    Local->>Clone: 5. Train distilled model to replicate behavior
    Attacker->>Clone: 6. Query local clone (no rate limits, no costs)

Why It Matters

IP theft: Organizations spend millions training proprietary models. Extraction steals that investment
Bypass rate limiting: The extracted model runs locally with no API costs or rate limits
Adversarial research: Once an attacker has a local copy, they can study it to find vulnerabilities that transfer back to the original model
Compliance violations: Extracted models may retain training data characteristics, creating data privacy issues

Denial of Service LLM10: Unbounded Consumption

LLMs are computationally expensive to run. A single inference request can consume significant GPU time, memory, and compute resources. Attackers exploit this asymmetry – it’s cheap to send a request but expensive to process one.

Attack Techniques

Prompt Flooding: Sending massive volumes of API requests to exhaust compute resources or rack up billing charges. Simple but effective against systems without rate limiting.

Context Window Stuffing: Crafting inputs that maximize the model’s context window usage. Processing a 200,000-token input costs dramatically more than processing a 100-token input. Repeated maximum-length requests can exhaust GPU memory and block other users.

Recursive Generation Attacks: Crafting prompts that cause the model to generate extremely long outputs – for example, asking it to enumerate all possible combinations of something, or creating self-referential prompts that cause extended generation loops.

Multi-Turn Resource Drain: In agentic systems, triggering long chains of tool calls, each of which consumes additional compute. A single prompt could trigger dozens of API calls, file operations, and code executions.

Financial Impact

For organizations using pay-per-token API models, unbounded consumption translates directly to financial damage. An attacker who can send unrestricted requests to a GPT-4 endpoint can run up thousands of dollars in charges per hour. This was the core of the LLMjacking attacks covered in Section 1.

Infrastructure Attacks

Beyond model-specific vulnerabilities, the infrastructure running AI systems is subject to traditional and AI-specific attack vectors.

Container and Runtime Exploits

Most model inference servers run in containers (Docker, Kubernetes). Standard container security risks apply:

Container escape: Breaking out of the container to access the host system
Privilege escalation: Exploiting misconfigured container permissions to access other services
Sidecar attacks: Compromising auxiliary containers that share the same pod

GPU Memory Vulnerabilities

GPUs process sensitive data during inference – prompts, context, model weights, and generated outputs all reside in GPU memory. Risks include:

GPU memory not cleared between requests: Previous users’ prompts or generated content may be accessible to subsequent requests
Cross-tenant data leakage: In shared GPU environments, one tenant’s data may be accessible to another
Memory scraping: Exploiting GPU driver vulnerabilities to read memory from other processes

API Security

AI API endpoints face standard API security challenges amplified by the expense of the underlying compute:

API key theft: Stolen keys provide unrestricted access to expensive compute resources (the LLMjacking scenario)
Insufficient rate limiting: Without per-user and per-endpoint rate limits, a single attacker can consume all available resources
Missing authentication: Internal model endpoints exposed without authentication

Teaching Moment: n8n CVE-2025-68613

n8n, the popular workflow automation platform used extensively in AI agent deployments (and covered in Chapter 1’s labs), disclosed CVE-2025-68613 – a Server-Side Request Forgery (SSRF) vulnerability. This allowed authenticated users to make arbitrary HTTP requests from the n8n server, potentially accessing internal services, cloud metadata endpoints, and other resources behind the firewall.

Why it matters for AI deployments: n8n is commonly used as the orchestration layer for AI agents, connecting LLMs to databases, APIs, and internal tools. An SSRF vulnerability in the orchestration layer means an attacker could leverage the AI workflow infrastructure to pivot into the internal network – using the same trusted connections the AI agents rely on.

Key takeaway: The security of your AI system includes the security of every component in the stack – not just the model itself. Workflow orchestrators, API gateways, vector databases, and monitoring tools all need to be patched and hardened.

Model Supply Chain at the Infrastructure Level LLM03: Supply Chain

Beyond the model-level supply chain risks covered in Section 3, infrastructure-level supply chain attacks target the deployment pipeline:

Compromised model registries: Poisoning the internal model registry or artifact store that deployment pipelines pull from
CI/CD pipeline attacks: Injecting malicious steps into the model deployment pipeline (model validation bypass, weight modification during packaging)
Typosquatting on model hubs: Publishing models with names designed to be confused with popular models, targeting automated deployment scripts that pull by name

Case Study: Meta Llama CVE-2024-50050

Real-World Impact: Pickle Deserialization RCE in Llama Stack

Who: Meta’s Llama framework users

When: 2024 (CVE-2024-50050)

What happened: A critical vulnerability was discovered in the Llama Stack’s use of Python’s pickle module for deserializing data received over ZeroMQ sockets. This allowed remote attackers to execute arbitrary code on any machine running the affected Llama Stack components.

How it worked:

The Llama Stack distribution server used ZeroMQ for inter-process communication
Messages exchanged between components were serialized/deserialized using pickle
An attacker who could send messages to the ZeroMQ socket could craft a malicious pickle payload
The server would deserialize the payload, executing the embedded code with the server’s permissions

OWASP mapping: LLM03: Supply Chain (vulnerability in the model serving framework itself).

Lesson: Serialization vulnerabilities aren’t limited to model files. Any component in the AI stack that uses pickle deserialization on untrusted input – including communication between internal services – is a potential RCE vector. The fix: use safer serialization formats (JSON, protobuf, safetensors) for all data exchange.

Case Study: Ultralytics Supply Chain Attack (December 2024)

Real-World Impact: Cryptominer in a Popular AI Package

Who: Ultralytics (YOLO computer vision library) users

When: December 2024

What happened: Ultralytics v8.3.41 and v8.3.42, published to PyPI, were infected with a cryptocurrency miner. The compromise affected one of the most popular computer vision packages in the Python ecosystem, downloaded millions of times.

How it worked:

Attackers compromised the Ultralytics build/publish pipeline
Malicious code was injected into the package before it was published to PyPI
The infected versions (v8.3.41 and v8.3.42) were downloaded by thousands of users and CI/CD systems
The malicious payload installed and ran a cryptocurrency miner on infected machines
Ultralytics released v8.3.43 as a clean version after discovery

OWASP mapping: LLM03: Supply Chain (compromised software dependency in the AI development ecosystem).

Lesson: AI supply chain attacks don’t always target the model – they target the tools you use to build with models. Pin your dependency versions, verify package checksums, use tools like pip audit or safety check, and monitor your CI/CD pipeline for unexpected changes. A compromised build pipeline can inject malicious code into any release.

Key Takeaways

Model serialization exploits (especially pickle deserialization) turn model loading into a remote code execution vector – loading an untrusted model is equivalent to running untrusted code.
Adversarial inputs exploit decision boundaries in high-dimensional spaces, causing models to misclassify inputs with imperceptible perturbations.
Model theft through API-based extraction allows attackers to reconstruct proprietary models by observing input-output pairs, bypassing rate limits and licensing.
Denial of service against LLMs exploits the cost asymmetry between cheap requests and expensive inference (prompt flooding, context window stuffing, recursive generation).
AI infrastructure (containers, GPU memory, API endpoints, orchestration platforms) inherits traditional security risks amplified by expensive compute and sensitive data processing.

Test Your Knowledge

Ready to test your understanding of model and infrastructure attacks? Head to the quiz to see how well you can identify serialization risks, extraction techniques, and infrastructure vulnerabilities.

Up next

Sections 1-4 covered the foundational attack categories: the attack surface overview, prompt injection, data poisoning, and infrastructure attacks. These are the most well-understood and commonly exploited vulnerabilities. In the next section, we’ll move into advanced territory: agentic attack vectors – where AI systems that can act autonomously create entirely new categories of risk.

Previous Section Back to Top Next Section