Section 7 Quiz :: Introduction to AI Security

Section 7 Quiz :: Introduction to AI Security https://example.org/chapter2/s7/activity/index.html Test Your Knowledge: Small Language Model (SLM) Threats Let’s see how much you’ve learned! This quiz tests your understanding of SLM-specific vulnerabilities, the “smaller = safer” misconception, edge deployment risks, reduced guardrails, amplified model theft, and how OWASP LLM Top 10 categories manifest differently in SLMs. --- shuffle_answers: true shuffle_questions: false --- ## A healthcare company deploys a 3B parameter model on tablets for patient intake. IT security approves it, reasoning that the small, locally-running model with no internet poses minimal risk. Six months later, simple jailbreak prompts bypass all safety filters. What misconception led to this security failure? > Hint: Think about the assumption the security team made about the relationship between model size and safety. - [ ] They assumed edge devices cannot be compromised > While edge device security was relevant, the core misconception was about the model's inherent safety, not the device's security posture. - [ ] They assumed the model was too old to have known vulnerabilities > Age of the model wasn't the issue. The security team's false assumption was about size and capability. - [x] The "smaller = safer" misconception -- they assumed a small model posed minimal risk, when research shows 47.6% of SLMs are highly susceptible to jailbreak attacks > Correct! The "smaller = safer" misconception is one of the most dangerous blind spots in AI security. The 2025 research across 63 SLMs found that 47.6% exhibited high susceptibility to jailbreak attacks. SLMs have fewer parameters to enforce safety constraints, receive less safety training investment, and simpler jailbreak techniques succeed against them that would fail against frontier models. The model's small size made it more vulnerable, not less. - [ ] They assumed that patient intake is a low-risk use case > Patient intake involving clinical data is actually a high-risk use case, but the core misconception was about model size and safety, not use case risk assessment. ## Which of the following correctly explains why SLMs have weaker safety guardrails compared to frontier models? > Hint: Think about the three factors that contribute to SLM safety limitations. - [ ] SLM developers intentionally remove safety features to save disk space > Safety features are encoded in model weights, not separate components. The issue is training investment, not intentional removal. - [x] Safety alignment (RLHF, constitutional AI, red-teaming) requires significant compute resources, SLMs have fewer parameters to simultaneously perform tasks and enforce safety, and community fine-tuned variants often receive minimal safety training > Correct! Three factors combine to weaken SLM safety: (1) Training budget constraints mean safety is a secondary priority after capability, (2) fewer parameters means less capacity to handle both task performance and safety enforcement simultaneously, and (3) community fine-tuned variants may undergo no safety training beyond what the base model received. Research shows safety guardrails can be removed from SLMs with as few as 100 examples of harmful content. - [ ] SLMs use a fundamentally different architecture that doesn't support safety features > SLMs use the same Transformer architecture as large models. They support safety features but have less capacity and training to implement them effectively. - [ ] Safety training is only effective on models with 100B+ parameters > There is no parameter threshold for safety effectiveness. Smaller models can have safety training, but it requires proportionally more careful design and is easier to circumvent. ## An attacker steals a proprietary 3B parameter SLM from an edge device, removes its safety guardrails on a consumer GPU in hours, and distributes the uncensored version through file sharing. Why is model theft particularly amplified for SLMs? > Hint: Compare the theft-to-weaponization pipeline for a 3B model versus a 400B model. - [ ] SLMs are always stored unencrypted on edge devices > While some edge devices lack encryption, the amplification is about the entire pipeline from theft to distribution, not just the storage format. - [ ] SLMs have more valuable proprietary data than large models > Value of proprietary data isn't size-dependent. The amplification is about how much easier each step of the theft pipeline is for small models. - [x] Small file size (1-15 GB) enables rapid exfiltration in minutes, consumer hardware can run the stolen model, safety removal requires only ~100 examples, and anonymous distribution through file sharing is trivial > Correct! Model theft maps to LLM10 (via the OWASP LLM Top 10) and is dramatically amplified for SLMs. Every step is easier: exfiltration takes minutes (not hours/days for 400GB+ models), any consumer laptop can run the stolen model (no GPU cluster needed), removing safety guardrails costs tens of dollars (not thousands), and the small file can be distributed via any file hosting platform. The proliferation of "uncensored" SLM variants on Hugging Face demonstrates this pipeline in action. - [ ] SLMs are the only models deployed on edge devices > While SLMs are commonly deployed on edge devices, the amplification comes from their technical characteristics (small size, low hardware requirements), not their exclusive use on edge devices. ## An edge-deployed SLM is jailbroken by a user, and the company wants to investigate how many interactions were affected. They discover there are no logs of the on-device interactions. Which edge deployment vulnerability does this illustrate? > Hint: Think about what security infrastructure edge devices typically lack compared to cloud deployments. - [ ] Physical access to model weights -- the user extracted and modified the model > Physical access to weights is about modifying or stealing the model. This scenario is about the inability to detect and investigate misuse after it occurs. - [x] Limited monitoring and observability -- edge-deployed SLMs often have no interaction logging, no real-time alerts, and no way to detect or audit misuse after the fact > Correct! Cloud-deployed models benefit from centralized logging, real-time monitoring, usage analytics, and anomaly detection. Edge-deployed SLMs often have none of these. Without interaction logs, it's impossible to determine how many conversations involved jailbroken outputs, whether any decisions were made based on those outputs, or how long the misuse has been occurring. This is a fundamental gap in edge AI security posture. - [ ] Resource-constrained security -- the device couldn't run safety filters > Resource constraints are about what the device can run in real-time. This scenario is about the absence of logging and audit trails, which is a different edge deployment vulnerability. - [ ] Quantization artifacts -- the safety training was degraded during compression > Quantization artifacts affect model behavior. This scenario is about the infrastructure missing logging capabilities, not about model behavior changes. ## How does LLM01: Prompt Injection manifest differently in SLMs compared to large frontier models? > Hint: Think about what SLMs have less capacity to do that frontier models can do better. - [ ] SLMs are immune to prompt injection because they have limited capabilities > SLMs are not immune -- they are actually more susceptible. Limited capabilities make them worse at defending, not immune to attacks. - [ ] Prompt injection affects SLMs and large models identically > The OWASP categories apply to both, but SLMs have specific characteristics that change how the vulnerability manifests and its severity. - [x] SLMs have fewer parameters to distinguish data from instructions, and their simpler safety training is more easily bypassed -- making prompt injection higher severity than in large models > Correct! For SLMs, LLM01: Prompt Injection is rated higher severity because SLMs have weaker instruction-data separation. The fundamental prompt injection problem (LLMs can't reliably distinguish developer instructions from user input) is worse in smaller models because they have fewer parameters dedicated to this distinction. Simple injection techniques that frontier models resist succeed against SLMs, as demonstrated by the 47.6% jailbreak susceptibility rate. - [ ] SLMs can only be attacked through direct injection, not indirect injection > SLMs are vulnerable to both direct and indirect injection. As SLMs gain tool-use capabilities (through MCP, function calling), they become targets for indirect injection through tool outputs and data sources. ## A developer downloads a quantized (4-bit) version of a safety-trained SLM for deployment on a mobile device. The full-precision model passes all safety benchmarks, but the quantized version fails several of them. What happened? > Hint: Think about what quantization does to the model's learned patterns and which patterns might be most fragile. - [ ] The quantization process introduced new malicious behavior > Quantization doesn't introduce new behaviors -- it reduces numerical precision, which can break existing behaviors. - [x] Aggressive quantization disproportionately degrades safety behaviors -- the safety training was more fragile than task performance and broke when precision was reduced > Correct! This is an SLM-specific supply chain risk under LLM03: Supply Chain. Quantization reduces numerical precision to save memory and compute, but safety behaviors can be more fragile than general task performance. A model that passes safety tests at full precision may fail them at 4-bit quantization because the safety training patterns are less robust to precision reduction. This means organizations must re-test safety after quantization, not just task performance. - [ ] The developer downloaded a malicious version of the quantized model > The scenario describes a legitimate quantization process degrading safety, not a supply chain compromise with a malicious download. - [ ] 4-bit quantization is always unsafe and should never be used > 4-bit quantization is widely used and can be safe. The key is that safety must be re-evaluated after quantization because it can degrade disproportionately. ## The first malicious MCP server discovered on npm (September 2025) accumulated 800+ downloads before detection. How does this incident relate to SLM security specifically? > Hint: Think about what happens as small models gain agentic capabilities. - [ ] Only SLMs can use MCP servers -- large models use different protocols > Both large and small models can use MCP servers. The protocol is model-size agnostic. - [ ] The malicious MCP server only targeted SLMs > The MCP server targeted any AI agent (or human developer) that connected to it. It was not specific to SLMs. - [x] As SLMs gain tool-use capabilities through MCP and function calling, they inherit the full spectrum of agentic supply chain risks (ASI04) -- but with weaker defenses against the prompt injection component embedded in tool responses > Correct! This maps to both LLM03: Supply Chain and ASI04: Agentic Supply Chain Vulnerabilities. The malicious MCP server injected hidden instructions into tool responses (prompt injection through the tool layer). SLMs are particularly vulnerable because they have weaker prompt injection defenses than frontier models. As the SLM ecosystem grows to include tool use, small models inherit all agentic risks while having less capacity to resist the embedded attacks. - [ ] The incident only affected JavaScript projects, not SLM deployments > While the package was on npm, MCP servers connect to AI agents regardless of the project's programming language. The agentic supply chain risk applies to any SLM with tool-use capability. ## Which of the following statements about the OWASP LLM Top 10 and SLMs is correct? > Hint: Think about whether the same categories apply but manifest differently or whether SLMs face entirely different threats. - [ ] Only 5 of the 10 OWASP LLM categories apply to SLMs -- the rest are exclusive to large models > All 10 categories apply to SLMs. The OWASP framework is model-size agnostic. - [ ] SLMs face lower severity across all 10 categories because they are less capable > This is the "smaller = safer" misconception. Most categories are actually higher severity for SLMs due to weaker defenses and edge deployment characteristics. - [x] All 10 OWASP LLM Top 10 categories apply to SLMs, but most manifest at higher severity -- including higher prompt injection susceptibility, easier model theft, lower poisoning thresholds, and degraded safety from quantization > Correct! Every OWASP category applies to SLMs with SLM-specific characteristics. LLM01 is higher severity (weaker instruction-data separation), LLM03 is higher (more unvetted variants), LLM04 is higher (lower poisoning threshold), LLM07 is higher (weaker prompt boundary enforcement), and LLM10 is much higher (dramatically easier to steal and weaponize). SLMs are not a safer subset of AI -- they are a more vulnerable one in most risk categories. - [ ] SLMs need their own separate framework because the OWASP LLM Top 10 doesn't apply to them > The OWASP LLM Top 10 applies to SLMs. While SLM-specific manifestations differ, the framework covers all LLM-based systems regardless of size. Hugo en-us