10. The LEARN Architecture :: Introduction to AI Security

10. The LEARN Architecture :: Introduction to AI Security https://example.org/chapter3/s10/index.html Introduction The Blueprint tells security teams what infrastructure to deploy. Layer by layer, it maps the controls that protect data, models, infrastructure, users, access services, and zero-day threats. But the Blueprint is infrastructure-centric – it answers “what should the platform do?” It doesn’t directly answer “how should I write my AI application to be secure?” Developers building AI applications need their own framework. They need to know how to validate inputs, how to constrain what agents can do, how to prevent data leakage from the code they write. The LEARN mnemonic organizes five key application-level defense practices that complement the infrastructure-focused Blueprint. Where the Blueprint protects the stack from the outside, LEARN hardens the application from the inside. Hugo en-us Section 10 Quiz https://example.org/chapter3/s10/activity/index.html Mon, 01 Jan 0001 00:00:00 +0000 https://example.org/chapter3/s10/activity/index.html Test Your Knowledge: The LEARN Architecture Let’s see how much you’ve learned! This quiz tests your understanding of all five LEARN components – Linguistic Shielding, Execution Supervision, Access Control, Robust Prompt Hardening, and Nondisclosure – their Blueprint layer mappings, and how LEARN complements the infrastructure-focused Blueprint. --- shuffle_answers: true shuffle_questions: false --- ## A developer is building an AI-powered customer support chatbot. Users can ask questions, but the chatbot should never execute code, access databases, or send emails. Which LEARN component is most relevant for constraining the chatbot's capabilities? > Hint: Think about which component addresses what AI agents can do. - [ ] Linguistic Shielding -- validate inputs to prevent injection > Linguistic Shielding focuses on input validation and injection defense. While important for the chatbot, the specific requirement here is constraining what the system can DO, not what inputs it accepts. - [x] Execution Supervision -- maintain tool allowlists, enforce sandboxing, and ensure the chatbot has no tool access to code execution, databases, or email sending > Correct! Execution Supervision (LEARN-E) is about monitoring and constraining what AI agents can do. For this chatbot, Execution Supervision means: an empty tool allowlist (the chatbot should have NO tools), sandbox enforcement preventing any code execution, and hard failures if any component attempts to invoke tools. Even though this chatbot shouldn't have tools at all, Execution Supervision ensures that constraint is enforced in code, not just assumed. - [ ] Access Control -- restrict who can use the chatbot > Access Control (LEARN-A) manages permissions and identity. While user access to the chatbot matters, the requirement is about constraining the chatbot's capabilities, not user permissions. - [ ] Nondisclosure -- prevent the chatbot from revealing sensitive information > Nondisclosure (LEARN-N) prevents data leakage. While important, the specific requirement is about preventing the chatbot from taking actions (code execution, database access, email), which is Execution Supervision. ## The section distinguishes between Blueprint (infrastructure) and LEARN (application) frameworks. A company has deployed an AI Gateway with prompt filtering (Blueprint Layer 5). A developer also implements input validation in their application code (LEARN-L). Why are both necessary? > Hint: Think about the defense-in-depth principle applied across infrastructure and application layers. - [ ] The AI Gateway is for external threats while application-level validation is for internal threats > Both layers defend against the same threats (prompt injection). The distinction isn't about threat source but about creating multiple independent defense layers. - [ ] Application-level validation is faster than the AI Gateway, improving response time > Performance isn't the primary reason. Both defenses add processing overhead. The value is in security redundancy, not performance. - [x] They create defense in depth -- if a novel injection technique bypasses the AI Gateway's infrastructure-level filter, the application-level input validation provides an independent second check that may catch what the Gateway missed > Correct! This is the core principle behind having both Blueprint and LEARN defenses. The AI Gateway (Blueprint Layer 5) and application-level Linguistic Shielding (LEARN-L) use different detection mechanisms, are maintained by different teams, and are updated on different schedules. An injection that bypasses one may be caught by the other. Three independent defense layers (LEARN-L application code, Blueprint Layer 5 AI Gateway, AI Guard runtime protection) make successful injection exponentially harder. - [ ] LEARN validation replaces the AI Gateway for organizations that can't afford infrastructure-level controls > LEARN is complementary to the Blueprint, not a replacement. Both should be implemented. The section explicitly positions LEARN as practices that "complement the infrastructure-focused Blueprint." ## A system prompt for an AI financial advisor includes pricing formulas and risk thresholds. The developer wants to prevent these from being leaked through adversarial probing. Which TWO LEARN components work together to address this? > Hint: Think about which components address system prompt defense and which address preventing sensitive content from appearing in outputs. - [ ] Linguistic Shielding (L) and Execution Supervision (E) > Linguistic Shielding addresses input validation and Execution Supervision addresses tool constraints. Neither specifically targets preventing business logic leakage from system prompts. - [x] Robust Prompt Hardening (R) and Nondisclosure (N) -- R hardens the system prompt against extraction attempts, and N ensures that even if extraction is attempted, output validation prevents pricing formulas and risk thresholds from appearing in responses > Correct! Robust Prompt Hardening (R) defends the system prompt itself through instruction anchoring and role boundary enforcement, making extraction attempts less likely to succeed. Nondisclosure (N) provides the backup -- output validation that detects numeric formulas, conditional logic, and policy rules in model responses and blocks them before delivery. Together, R makes extraction harder and N catches leakage that gets past R. - [ ] Access Control (A) and Nondisclosure (N) > Access Control manages permissions and credentials. While relevant for who can access the system, the specific challenge of preventing system prompt content leakage through adversarial probing is addressed by prompt hardening (R) and output filtering (N). - [ ] Linguistic Shielding (L) and Nondisclosure (N) > Linguistic Shielding validates inputs. While input validation might catch some extraction attempts, the specific defense for system prompt protection is Robust Prompt Hardening (R), which hardens the prompt itself against adversarial manipulation. ## The LEARN vs Blueprint comparison table shows that Blueprint's primary audience is "security teams, infrastructure engineers, platform teams" while LEARN's primary audience is "AI developers, ML engineers, application builders." Why does this audience distinction matter? > Hint: Think about who implements each type of control and where in the process. - [ ] Security teams and developers have different budgets for security tools > Budget allocation isn't the point. The distinction matters because different roles implement controls at different layers of the stack. - [x] Different roles implement controls at different levels -- security teams configure platform-level controls (AI Gateway, AI-SPM), while developers implement application-level practices (input validation, tool allowlists, output validation) in their code. Both levels must be secured for complete protection > Correct! The audience distinction reflects the implementation reality. A security team configuring the AI Gateway can't control how a developer validates inputs in their application code. A developer writing application-level LEARN practices can't configure the organization's ZTSA policies. Both levels need their own security framework because they're implemented by different teams using different tools at different stages of the deployment process. - [ ] Developers don't need to understand infrastructure security > All practitioners benefit from understanding the full security stack. The distinction is about who implements which controls, not about limiting knowledge. - [ ] Security teams should implement LEARN practices on behalf of developers > LEARN practices are implemented in application code by developers. Security teams can guide and audit LEARN implementation but can't implement it without access to the application codebase. ## The ChatGPT Memory Exploitation breach narrative shows how LEARN components would have mitigated the attack at the application level. The section identifies Linguistic Shielding, Access Control, and Nondisclosure as the relevant components. Which LEARN component was NOT listed, and why is it irrelevant to this attack? > Hint: Think about what the memory poisoning attack involves and which LEARN components don't apply. - [ ] Robust Prompt Hardening (R) -- because memory poisoning doesn't involve system prompts > While the memory poisoning attack primarily targets the data layer rather than the system prompt, prompt hardening could still be tangentially relevant for how the model processes memory entries. This isn't the component the section excludes. - [x] Execution Supervision (E) -- because the memory poisoning attack doesn't involve tool execution or agent actions; it targets the data layer (memory store) through the model's input processing, not through tool calls > Correct! The ChatGPT Memory Exploitation planted false memories through document processing -- it didn't involve the model executing tools, calling APIs, or taking real-world actions. Execution Supervision constrains agent tool use, which isn't relevant when the attack vector is input processing that writes to a memory store. The relevant defenses are input validation (L), memory store permissions (A), and output filtering to catch poisoned memory influence (N). - [ ] Access Control (A) -- because memory stores don't have permissions > Access Control IS listed as a relevant component. The section specifically identifies that "only explicit, user-confirmed actions can create memory entries" -- an access control requirement for the memory store. - [ ] Nondisclosure (N) -- because the attack isn't about data leakage > Nondisclosure IS listed as a relevant component. The section identifies that memory entries should be validated for instruction-like content before they influence responses -- a nondisclosure concern about the model acting on hidden instructions. ## A developer adds a LEARN-R checklist item: "Version control system prompts -- system prompts are versioned, reviewed, and tested like application code." Why does the section recommend treating system prompts with the same rigor as code? > Hint: Think about what a system prompt is functionally -- what role does it play in determining application behavior? - [ ] System prompts contain API keys and credentials that need version control > System prompts should never contain credentials (that's a LEARN-A practice). The recommendation is about behavioral control, not credential management. - [ ] Version control generates an audit trail for compliance reporting > While audit trails are valuable, the primary reason is about the functional impact of system prompt changes on application security behavior. - [x] System prompts determine the AI application's behavior, security boundaries, and safety constraints -- changes to system prompts can introduce security vulnerabilities just like code changes, so they need the same review, testing, and version control discipline > Correct! A system prompt change that removes an instruction anchoring statement, weakens a role boundary, or alters a behavioral constraint can open the application to jailbreaking, prompt leakage, or safety bypasses -- security vulnerabilities as real as code bugs. Treating system prompts like code means they go through pull request review, are tested with the adversarial test suite before deployment, and have rollback capability if a change introduces a regression. - [ ] Version control prevents unauthorized modification of system prompts in production > Version control tracks changes but doesn't prevent unauthorized modification in a running system. Deployment controls prevent unauthorized changes; version control provides history and review. ## An organization implements all five LEARN components for their AI application. They also have the full Blueprint deployed. A novel attack technique is published that exploits a gap between the AI Gateway's filtering and the application's input validation. Which organizational practice from the broader security program would catch this gap? > Hint: Think about what process systematically tests whether defenses work against real attack techniques. - [ ] Acceptable use policies -- employees would be prohibited from attempting the attack > Acceptable use policies govern internal behavior. They don't test whether defenses work against external attackers using novel techniques. - [ ] Security champion review -- the champion would identify the gap during code review > Security champions review code and configurations, but a gap between infrastructure-level and application-level defenses may not be visible in code review alone. It requires active testing. - [x] AI red-teaming -- a dedicated adversarial testing program that tests both infrastructure and application defenses using the latest attack techniques would discover the gap between Gateway filtering and application validation > Correct! Red-teaming (covered in Section 11) systematically tests whether deployed defenses actually work under realistic attack conditions. A red team testing the latest published technique would discover that it bypasses the Gateway's infrastructure filtering AND the application's LEARN-L validation -- identifying the gap that neither team (infrastructure or application) caught individually. This is why red-teaming complements both Blueprint and LEARN. - [ ] NIST AI RMF compliance -- the framework's Measure function would identify the gap > NIST AI RMF provides governance structure but doesn't conduct technical testing. The Measure function assesses risks quantitatively, which could identify the gap area, but the actual discovery comes from adversarial testing (red-teaming).