11. Building an AI Security Culture

Introduction

Technology alone cannot secure AI systems. The most sophisticated Blueprint layer is useless if the organization’s culture doesn’t support security practices. The most carefully hardened system prompt fails if developers bypass it under deadline pressure. The most comprehensive AI Gateway is irrelevant if the security team doesn’t know how to respond when it detects an attack.

This section is the capstone of Chapter 3 – and of the entire course. It ties together the technical controls from the Blueprint, the application practices from LEARN, and the operational capabilities from AI Scanner and Guard into the organizational context that makes them work. You’ll learn how to red-team AI systems using the attack techniques from Chapter 2, how to adapt incident response for AI-specific threats, how regulatory frameworks like NIST AI RMF and the EU AI Act shape compliance requirements, and how to build the organizational practices that sustain AI security over time.

What will I get out of this?

By the end of this section, you will be able to:

  1. Design an AI red-teaming program that uses structured adversarial testing to identify vulnerabilities before attackers do.
  2. Adapt incident response procedures for AI-specific threats, including model compromise, data poisoning, and agent hijacking scenarios.
  3. Explain the NIST AI Risk Management Framework and how its Govern, Map, Measure, Manage functions relate to the Blueprint.
  4. Describe EU AI Act compliance requirements including risk-based classification and obligations for AI system providers and deployers.
  5. Establish organizational AI security practices including security champions, training programs, and acceptable use policies.

AI Red-Teaming

Red-teaming is structured adversarial testing – a dedicated team attempts to break the AI system using real attack techniques, then reports findings so the organization can fix vulnerabilities before real attackers exploit them. For AI systems, red-teaming goes beyond traditional penetration testing to include the unique attack vectors you studied in Chapter 2.

Red Team Objectives

An AI red team aims to:

  • Discover unknown vulnerabilities – finding what automated scanners miss through creative, human-driven adversarial testing
  • Validate existing defenses – testing whether Blueprint controls, LEARN practices, and Scanner/Guard protections actually work under realistic attack conditions
  • Assess blast radius – determining what an attacker can achieve if they succeed, not just whether they can succeed
  • Generate actionable findings – producing specific, reproducible vulnerability reports that development and security teams can act on

Red Team Methodology

graph LR
    PL["<b>Plan</b><br/><small>Define scope, rules<br/>of engagement, target<br/>systems, and success<br/>criteria</small>"]
    TE["<b>Test</b><br/><small>Execute attacks using<br/>Chapter 2 techniques:<br/>injection, poisoning,<br/>jailbreaking, tool abuse</small>"]
    DO["<b>Document</b><br/><small>Record all findings,<br/>reproduction steps,<br/>severity assessment,<br/>evidence</small>"]
    RE["<b>Remediate</b><br/><small>Develop and deploy<br/>fixes: rule updates,<br/>prompt hardening,<br/>configuration changes</small>"]
    RT["<b>Retest</b><br/><small>Verify fixes work.<br/>Test for regressions.<br/>Update baselines.</small>"]

    PL --> TE --> DO --> RE --> RT
    RT -.->|"Next<br/>cycle"| PL

    style PL fill:#2d5016,color:#fff
    style TE fill:#8b0000,color:#fff
    style DO fill:#2d5016,color:#fff
    style RE fill:#1C90F3,color:#fff
    style RT fill:#2d5016,color:#fff

AI-Specific Red Team Techniques

The red team’s testing toolkit maps directly to Chapter 2’s attack categories:

Attack Category Red Team Technique What to Test
Prompt injection Direct and indirect injection attempts Does the AI Gateway catch injection? Does the system prompt hold?
Jailbreaking Role-play, encoding bypass, multi-turn escalation Can safety alignment be broken under sustained adversarial pressure?
Data extraction System prompt probing, training data extraction Does the model leak its instructions or memorized training data?
Tool misuse Tool output injection, permission escalation Can an agent be manipulated into unauthorized tool calls?
Agent hijacking Goal redirection through crafted context Can agent goals be redirected through injected instructions?

Scoping and Frequency

Red-teaming should occur:

  • Before major releases: Any new AI capability, model update, or system prompt change triggers a red team assessment
  • On a regular schedule: Quarterly or semi-annual red team exercises maintain continuous security validation
  • After incidents: Post-incident red-teaming verifies that remediations are effective and identifies related vulnerabilities
  • When threat landscape changes: New attack techniques published by researchers should be tested against deployed systems
Defense Connection

AI red-teaming directly applies the techniques from Chapter 2’s attack surface mapping. The red team uses the OWASP LLM Top 10 and Agentic AI Top 10 as their testing checklist, systematically probing every category against the deployed defenses. Red-teaming transforms the attack knowledge from Chapter 2 into defensive action.


AI-Specific Incident Response

Traditional incident response procedures – detect, contain, eradicate, recover – still apply to AI systems. But the specific indicators, containment strategies, and recovery procedures must be adapted for AI-specific threats.

AI-Specific Indicators of Compromise

AI incidents often manifest differently from traditional security incidents:

Indicator What It Suggests Detection Method
Sudden change in model output quality or tone Model may be under manipulation or prompt injection attack Output quality monitoring, behavioral anomaly detection (Layer 6)
Unusual tool call patterns from agents Agent may be hijacked or under adversarial control Tool call logging, execution pattern monitoring (LEARN-E)
Spike in Guard blocking events Active attack campaign targeting the AI system Guard dashboard monitoring, alert correlation
Unexpected data access patterns by AI service accounts Credential compromise or privilege escalation IAM audit logs, access pattern baselines (Layer 3)
User reports of incorrect or harmful AI responses Data poisoning, model compromise, or guardrail failure User feedback channels, response quality metrics
Cost anomalies in AI service billing Unauthorized usage, resource abuse, or denial of service Cost monitoring, consumption baselines (Layer 5)

AI Incident Containment Strategies

Containment for AI incidents requires AI-specific actions:

Model Compromise Containment:

  • Immediately route traffic to a known-good model version while investigating the compromised version
  • Revoke and rotate all credentials associated with the compromised model endpoint
  • Preserve model artifacts, logs, and configuration as forensic evidence
  • Notify affected users if the compromised model may have generated harmful or incorrect outputs

Data Poisoning Containment:

  • Quarantine the affected data source (training dataset, RAG corpus, or memory store)
  • Switch RAG retrieval to a verified backup corpus
  • Flag all outputs generated during the poisoning window for review
  • Initiate data lineage investigation to determine the scope of contamination

Agent Hijacking Containment:

  • Immediately suspend the compromised agent’s tool access
  • Revoke all credentials associated with the agent
  • Audit all actions taken by the agent during the compromise window
  • Review and potentially roll back any changes made by the hijacked agent (database modifications, file changes, communications sent)

Recovery Procedures

AI incident recovery goes beyond patching and restarting:

  1. Root cause analysis: Determine how the attack succeeded – which control failed, which layer was bypassed, which practice was not followed
  2. Model restoration: Deploy a verified clean model version, re-assess with AI Scanner before putting it back in production
  3. Data integrity verification: Verify that training data, RAG corpora, and memory stores are free from contamination
  4. Control updates: Update AI Guard rules, Scanner test libraries, and ZTSA policies based on the attack technique used
  5. Lessons learned: Document the incident and update the organization’s AI threat model, red team test cases, and training materials

Regulatory Frameworks

AI security doesn’t operate in a regulatory vacuum. Two frameworks in particular shape how organizations approach AI risk management and compliance.

NIST AI Risk Management Framework (AI RMF)

The NIST AI RMF provides a structured approach to managing AI risks. It organizes AI risk management into four core functions:

Function Purpose Blueprint Connection
Govern Establish AI governance structures, policies, and accountability Organizational AI security policies, acceptable use, security champion programs
Map Identify and understand AI risks in context Threat modeling (Section 1), OWASP mapping, attack surface analysis
Measure Assess and monitor AI risks quantitatively AI-SPM risk scoring (Layer 3), Scanner assessments, Guard metrics
Manage Prioritize and act on AI risks Blueprint layer controls, LEARN practices, Scanner/Guard continuous loop

The NIST AI RMF is not prescriptive about specific controls – it provides the governance structure within which organizations select and implement controls like those in the Blueprint. Think of NIST AI RMF as the “why and how to govern” and the Blueprint as the “what to deploy.”

EU AI Act

The EU AI Act establishes a risk-based classification system for AI systems, with compliance requirements that scale with risk level:

Risk Classifications:

Risk Level Definition Examples Requirements
Unacceptable AI practices that pose a clear threat to fundamental rights Social scoring by governments, real-time biometric surveillance in public spaces Prohibited
High-risk AI systems in critical domains that affect health, safety, or fundamental rights Medical diagnostics, employment screening, credit scoring, critical infrastructure Conformity assessment, risk management, data governance, transparency, human oversight
Limited risk AI systems with specific transparency obligations Chatbots (must disclose AI nature), emotion recognition, deepfake generators Transparency requirements
Minimal risk AI systems with no additional requirements AI-powered spam filters, AI in video games No specific requirements beyond existing law

Compliance Implications for AI Security:

Organizations deploying AI systems that interact with EU residents or operate within EU markets should:

  • Classify their AI systems by risk level and understand which requirements apply
  • Implement required risk management – high-risk systems need documented risk assessment, which the NIST AI RMF structure supports
  • Ensure data governance – training data quality, representativeness, and bias management align with Layer 1 (Data) controls
  • Provide transparency – users must know when they’re interacting with AI, and high-risk systems require explainability
  • Enable human oversight – high-risk systems must allow meaningful human intervention, which maps to approval workflows (LEARN-E) and human-in-the-loop enforcement (Layer 4)

Organizational AI Security Practices

Frameworks and regulations set the context. Day-to-day organizational practices make security real.

Security Champions for AI

The security champions model – designating security-aware individuals within development teams – extends naturally to AI:

  • AI security champions are developers or ML engineers within each team who receive additional training on AI-specific threats and defenses
  • They review system prompts, tool configurations, and access permissions as part of the development process
  • They participate in red team exercises and bring findings back to their teams
  • They serve as the bridge between the security team (which understands threats) and the development team (which understands the application)

AI Security Training Programs

Training should cover:

  • Threat awareness: Chapter 2’s attack categories – every developer should understand prompt injection, data poisoning, and agentic attack vectors at a practical level
  • Secure development: LEARN practices – how to implement each component in the technologies the team uses
  • Incident recognition: How to identify when an AI system may be under attack or behaving anomalously
  • Regulatory awareness: Applicable regulatory requirements (NIST AI RMF, EU AI Act) and how they affect development decisions

Acceptable Use Policies for AI

Organizations need clear, enforceable policies covering:

  • What data can be shared with AI tools – both approved internal tools and the boundary conditions for external services
  • How AI outputs should be validated – when human review is required, when automated validation is sufficient
  • What AI capabilities require approval – new model deployments, tool integrations, and agent capabilities that require security review
  • How AI incidents should be reported – clear escalation paths when users or developers notice anomalous AI behavior

AI Ethics Integration

Security and ethics intersect in AI deployments:

  • Bias monitoring: Technical controls to detect and measure bias in AI outputs complement ethical commitments to fairness
  • Transparency practices: Security controls like audit logging serve both security (detecting attacks) and ethics (enabling accountability)
  • Human oversight: The same approval workflows that prevent agent hijacking (LEARN-E) also ensure human accountability for high-stakes AI decisions
Defense Connection

Organizational practices address the human element that agentic attack vectors exploit. The social engineering dimension of AI attacks – tricking developers into installing malicious MCP servers, convincing users to ignore AI safety warnings, exploiting organizational trust in AI outputs – is countered by training, awareness, and governance, not by technology alone.


Defense Perspective: When Technology Isn’t Enough

Throughout Chapter 2, you studied attacks that succeeded despite the existence of technical defenses:

Samsung’s data leak (Chapter 2 context): Samsung had data protection policies, but employees pasted proprietary semiconductor data into ChatGPT anyway – not maliciously, but because productivity pressure overrode security awareness. No firewall stopped the data from leaving. Layer 4’s shadow AI governance and DLP would have helped, but the root cause was cultural: engineers didn’t understand the risk.

The Cursor MCP exploitation (Chapter 2 Section 5): Developers installed unverified MCP servers because the ecosystem made it easy and there was no governance process requiring security review. Technology could have flagged the installation, but the organizational practice of “install tools freely to move fast” created the vulnerability window.

Microsoft LLMjacking (Chapter 2 Section 1): API credentials were exposed in public repositories – a known risk that every security team warns about. The credentials weren’t exposed because the organization lacked secrets management tools. They were exposed because development practices (committing .env files, hardcoding keys during testing) weren’t enforced with automated pre-commit checks and cultural reinforcement.

The pattern: In every case, the technical controls existed or were available. What failed was the organizational context – the training, the policies, the culture, the governance – that ensures technical controls are actually used. Building an AI security culture isn’t a nice-to-have supplement to the Blueprint. It’s the foundation that makes the Blueprint effective.


Bringing It All Together

You’ve completed a three-chapter journey through AI security:

  • Chapter 1 built your foundation – you learned how AI and LLM systems work, from transformer architectures through agentic capabilities, establishing the vocabulary and technical understanding needed for security analysis.

  • Chapter 2 mapped the threat landscape – you studied the OWASP LLM Top 10, the Agentic AI Top 10, and real-world attack case studies, learning how attackers exploit every component of the AI stack from prompts through agents to infrastructure.

  • Chapter 3 built your defenses – you learned the 6-layer Blueprint for infrastructure protection, the LEARN Architecture for application-level security, the AI Scanner/Guard continuous loop for operational defense, and the organizational practices that make technical controls effective.

The three chapters form a complete arc: understand the technology, recognize the threats, build the defenses. AI security is not a destination – it’s an ongoing discipline that requires continuous learning, testing, and adaptation. The frameworks, tools, and practices in this course give you the foundation to build and maintain secure AI systems in a landscape where both capabilities and threats evolve rapidly.

Key Takeaways
  • AI red-teaming uses structured adversarial testing based on OWASP attack categories to discover vulnerabilities before attackers do and validate that Blueprint and LEARN defenses hold under realistic conditions
  • AI-specific incident response requires adapted containment strategies for model compromise, data poisoning, and agent hijacking – including model rollback, corpus quarantine, and credential revocation
  • Regulatory frameworks (NIST AI RMF and EU AI Act) provide governance structures and risk-based classification that shape how organizations implement and document their AI security controls
  • Organizational practices – security champions, training programs, acceptable use policies, and ethics integration – are the foundation that makes technical defenses effective
  • The three-chapter arc provides a complete AI security education: understand the technology (Chapter 1), recognize the threats (Chapter 2), build the defenses (Chapter 3)

Test Your Knowledge

Ready to test your understanding of AI security culture and governance? Head to the quiz to check your knowledge.


Course Complete!

Congratulations on completing all three chapters of the AI security course. You now have the knowledge to understand AI systems, recognize how they’re attacked, and build the defenses that protect them. The frameworks and practices you’ve learned – from OWASP categories to Blueprint layers to LEARN components – provide a comprehensive foundation for securing AI in your organization.