11. Building an AI Security Culture

Introduction

Technology alone cannot secure AI systems. The most sophisticated Blueprint layer is useless if the organization’s culture doesn’t support security practices. The most carefully hardened system prompt fails if developers bypass it under deadline pressure. The most comprehensive AI Gateway is irrelevant if the security team doesn’t know how to respond when it detects an attack.

This section is the capstone of Chapter 3 – and of the entire course. It ties together the technical controls from the Blueprint, the application practices from LEARN, and the operational capabilities from AI Scanner and Guard into the organizational context that makes them work. You’ll learn how to red-team AI systems using the attack techniques from Chapter 2, how to adapt incident response for AI-specific threats, how regulatory frameworks like NIST AI RMF and the EU AI Act shape compliance requirements, and how to build the organizational practices that sustain AI security over time.

What will I get out of this?

By the end of this section, you will be able to:

Design an AI red-teaming program that uses structured adversarial testing to identify vulnerabilities before attackers do.
Adapt incident response procedures for AI-specific threats, including model compromise, data poisoning, and agent hijacking scenarios.
Explain the NIST AI Risk Management Framework and how its Govern, Map, Measure, Manage functions relate to the Blueprint.
Describe EU AI Act compliance requirements including risk-based classification and obligations for AI system providers and deployers.
Establish organizational AI security practices including security champions, training programs, and acceptable use policies.

AI Red-Teaming

Red-teaming is structured adversarial testing – a dedicated team attempts to break the AI system using real attack techniques, then reports findings so the organization can fix vulnerabilities before real attackers exploit them. For AI systems, red-teaming goes beyond traditional penetration testing to include the unique attack vectors you studied in Chapter 2.

Red Team Objectives

An AI red team aims to:

Discover unknown vulnerabilities – finding what automated scanners miss through creative, human-driven adversarial testing
Validate existing defenses – testing whether Blueprint controls, LEARN practices, and Scanner/Guard protections actually work under realistic attack conditions
Assess blast radius – determining what an attacker can achieve if they succeed, not just whether they can succeed
Generate actionable findings – producing specific, reproducible vulnerability reports that development and security teams can act on

Red Team Methodology

graph LR
    PL["<b>Plan</b><br/><small>Define scope, rules<br/>of engagement, target<br/>systems, and success<br/>criteria</small>"]
    TE["<b>Test</b><br/><small>Execute attacks using<br/>Chapter 2 techniques:<br/>injection, poisoning,<br/>jailbreaking, tool abuse</small>"]
    DO["<b>Document</b><br/><small>Record all findings,<br/>reproduction steps,<br/>severity assessment,<br/>evidence</small>"]
    RE["<b>Remediate</b><br/><small>Develop and deploy<br/>fixes: rule updates,<br/>prompt hardening,<br/>configuration changes</small>"]
    RT["<b>Retest</b><br/><small>Verify fixes work.<br/>Test for regressions.<br/>Update baselines.</small>"]

    PL --> TE --> DO --> RE --> RT
    RT -.->|"Next<br/>cycle"| PL

    style PL fill:#2d5016,color:#fff
    style TE fill:#8b0000,color:#fff
    style DO fill:#2d5016,color:#fff
    style RE fill:#1C90F3,color:#fff
    style RT fill:#2d5016,color:#fff

AI-Specific Red Team Techniques

The red team’s testing toolkit maps directly to Chapter 2’s attack categories:

Attack Category	Red Team Technique	What to Test
Prompt injection	Direct and indirect injection attempts	Does the AI Gateway catch injection? Does the system prompt hold?
Jailbreaking	Role-play, encoding bypass, multi-turn escalation	Can safety alignment be broken under sustained adversarial pressure?
Data extraction	System prompt probing, training data extraction	Does the model leak its instructions or memorized training data?
Tool misuse	Tool output injection, permission escalation	Can an agent be manipulated into unauthorized tool calls?
Agent hijacking	Goal redirection through crafted context	Can agent goals be redirected through injected instructions?

Scoping and Frequency

Red-teaming should occur:

Before major releases: Any new AI capability, model update, or system prompt change triggers a red team assessment
On a regular schedule: Quarterly or semi-annual red team exercises maintain continuous security validation
After incidents: Post-incident red-teaming verifies that remediations are effective and identifies related vulnerabilities
When threat landscape changes: New attack techniques published by researchers should be tested against deployed systems

Defense Connection

AI red-teaming directly applies the techniques from Chapter 2’s attack surface mapping. The red team uses the OWASP LLM Top 10 and Agentic AI Top 10 as their testing checklist, systematically probing every category against the deployed defenses. Red-teaming transforms the attack knowledge from Chapter 2 into defensive action.

AI-Specific Incident Response

Traditional incident response procedures – detect, contain, eradicate, recover – still apply to AI systems. But the specific indicators, containment strategies, and recovery procedures must be adapted for AI-specific threats.

AI-Specific Indicators of Compromise

AI incidents often manifest differently from traditional security incidents:

Indicator	What It Suggests	Detection Method
Sudden change in model output quality or tone	Model may be under manipulation or prompt injection attack	Output quality monitoring, behavioral anomaly detection (Layer 6)
Unusual tool call patterns from agents	Agent may be hijacked or under adversarial control	Tool call logging, execution pattern monitoring (LEARN-E)
Spike in Guard blocking events	Active attack campaign targeting the AI system	Guard dashboard monitoring, alert correlation
Unexpected data access patterns by AI service accounts	Credential compromise or privilege escalation	IAM audit logs, access pattern baselines (Layer 3)
User reports of incorrect or harmful AI responses	Data poisoning, model compromise, or guardrail failure	User feedback channels, response quality metrics
Cost anomalies in AI service billing	Unauthorized usage, resource abuse, or denial of service	Cost monitoring, consumption baselines (Layer 5)

AI Incident Containment Strategies

Containment for AI incidents requires AI-specific actions:

Model Compromise Containment:

Immediately route traffic to a known-good model version while investigating the compromised version
Revoke and rotate all credentials associated with the compromised model endpoint
Preserve model artifacts, logs, and configuration as forensic evidence
Notify affected users if the compromised model may have generated harmful or incorrect outputs

Data Poisoning Containment:

Quarantine the affected data source (training dataset, RAG corpus, or memory store)
Switch RAG retrieval to a verified backup corpus
Flag all outputs generated during the poisoning window for review
Initiate data lineage investigation to determine the scope of contamination

Agent Hijacking Containment:

Immediately suspend the compromised agent’s tool access
Revoke all credentials associated with the agent
Audit all actions taken by the agent during the compromise window
Review and potentially roll back any changes made by the hijacked agent (database modifications, file changes, communications sent)

Recovery Procedures

AI incident recovery goes beyond patching and restarting:

Root cause analysis: Determine how the attack succeeded – which control failed, which layer was bypassed, which practice was not followed
Model restoration: Deploy a verified clean model version, re-assess with AI Scanner before putting it back in production
Data integrity verification: Verify that training data, RAG corpora, and memory stores are free from contamination
Control updates: Update AI Guard rules, Scanner test libraries, and ZTSA policies based on the attack technique used
Lessons learned: Document the incident and update the organization’s AI threat model, red team test cases, and training materials

Regulatory Frameworks

AI security doesn’t operate in a regulatory vacuum. Two frameworks in particular shape how organizations approach AI risk management and compliance.

NIST AI Risk Management Framework (AI RMF)

The NIST AI RMF provides a structured approach to managing AI risks. It organizes AI risk management into four core functions:

Function	Purpose	Blueprint Connection
Govern	Establish AI governance structures, policies, and accountability	Organizational AI security policies, acceptable use, security champion programs
Map	Identify and understand AI risks in context	Threat modeling (Section 1), OWASP mapping, attack surface analysis
Measure	Assess and monitor AI risks quantitatively	AI-SPM risk scoring (Layer 3), Scanner assessments, Guard metrics
Manage	Prioritize and act on AI risks	Blueprint layer controls, LEARN practices, Scanner/Guard continuous loop

The NIST AI RMF is not prescriptive about specific controls – it provides the governance structure within which organizations select and implement controls like those in the Blueprint. Think of NIST AI RMF as the “why and how to govern” and the Blueprint as the “what to deploy.”

EU AI Act

The EU AI Act establishes a risk-based classification system for AI systems, with compliance requirements that scale with risk level:

Risk Classifications:

Risk Level	Definition	Examples	Requirements
Unacceptable	AI practices that pose a clear threat to fundamental rights	Social scoring by governments, real-time biometric surveillance in public spaces	Prohibited
High-risk	AI systems in critical domains that affect health, safety, or fundamental rights	Medical diagnostics, employment screening, credit scoring, critical infrastructure	Conformity assessment, risk management, data governance, transparency, human oversight
Limited risk	AI systems with specific transparency obligations	Chatbots (must disclose AI nature), emotion recognition, deepfake generators	Transparency requirements
Minimal risk	AI systems with no additional requirements	AI-powered spam filters, AI in video games	No specific requirements beyond existing law

Compliance Implications for AI Security:

Organizations deploying AI systems that interact with EU residents or operate within EU markets should:

Classify their AI systems by risk level and understand which requirements apply
Implement required risk management – high-risk systems need documented risk assessment, which the NIST AI RMF structure supports
Ensure data governance – training data quality, representativeness, and bias management align with Layer 1 (Data) controls
Provide transparency – users must know when they’re interacting with AI, and high-risk systems require explainability
Enable human oversight – high-risk systems must allow meaningful human intervention, which maps to approval workflows (LEARN-E) and human-in-the-loop enforcement (Layer 4)

Organizational AI Security Practices

Frameworks and regulations set the context. Day-to-day organizational practices make security real.

Security Champions for AI

The security champions model – designating security-aware individuals within development teams – extends naturally to AI:

AI security champions are developers or ML engineers within each team who receive additional training on AI-specific threats and defenses
They review system prompts, tool configurations, and access permissions as part of the development process
They participate in red team exercises and bring findings back to their teams
They serve as the bridge between the security team (which understands threats) and the development team (which understands the application)

AI Security Training Programs

Training should cover:

Threat awareness: Chapter 2’s attack categories – every developer should understand prompt injection, data poisoning, and agentic attack vectors at a practical level
Secure development: LEARN practices – how to implement each component in the technologies the team uses
Incident recognition: How to identify when an AI system may be under attack or behaving anomalously
Regulatory awareness: Applicable regulatory requirements (NIST AI RMF, EU AI Act) and how they affect development decisions

Acceptable Use Policies for AI

Organizations need clear, enforceable policies covering:

What data can be shared with AI tools – both approved internal tools and the boundary conditions for external services
How AI outputs should be validated – when human review is required, when automated validation is sufficient
What AI capabilities require approval – new model deployments, tool integrations, and agent capabilities that require security review
How AI incidents should be reported – clear escalation paths when users or developers notice anomalous AI behavior

AI Ethics Integration

Security and ethics intersect in AI deployments:

Bias monitoring: Technical controls to detect and measure bias in AI outputs complement ethical commitments to fairness
Transparency practices: Security controls like audit logging serve both security (detecting attacks) and ethics (enabling accountability)
Human oversight: The same approval workflows that prevent agent hijacking (LEARN-E) also ensure human accountability for high-stakes AI decisions

Defense Connection

Organizational practices address the human element that agentic attack vectors exploit. The social engineering dimension of AI attacks – tricking developers into installing malicious MCP servers, convincing users to ignore AI safety warnings, exploiting organizational trust in AI outputs – is countered by training, awareness, and governance, not by technology alone.

Defense Perspective: When Technology Isn’t Enough

Throughout Chapter 2, you studied attacks that succeeded despite the existence of technical defenses:

Samsung’s data leak (Chapter 2 context): Samsung had data protection policies, but employees pasted proprietary semiconductor data into ChatGPT anyway – not maliciously, but because productivity pressure overrode security awareness. No firewall stopped the data from leaving. Layer 4’s shadow AI governance and DLP would have helped, but the root cause was cultural: engineers didn’t understand the risk.

The Cursor MCP exploitation (Chapter 2 Section 5): Developers installed unverified MCP servers because the ecosystem made it easy and there was no governance process requiring security review. Technology could have flagged the installation, but the organizational practice of “install tools freely to move fast” created the vulnerability window.

Microsoft LLMjacking (Chapter 2 Section 1): API credentials were exposed in public repositories – a known risk that every security team warns about. The credentials weren’t exposed because the organization lacked secrets management tools. They were exposed because development practices (committing .env files, hardcoding keys during testing) weren’t enforced with automated pre-commit checks and cultural reinforcement.

The pattern: In every case, the technical controls existed or were available. What failed was the organizational context – the training, the policies, the culture, the governance – that ensures technical controls are actually used. Building an AI security culture isn’t a nice-to-have supplement to the Blueprint. It’s the foundation that makes the Blueprint effective.

Bringing It All Together

You’ve completed a three-chapter journey through AI security:

Chapter 1 built your foundation – you learned how AI and LLM systems work, from transformer architectures through agentic capabilities, establishing the vocabulary and technical understanding needed for security analysis.
Chapter 2 mapped the threat landscape – you studied the OWASP LLM Top 10, the Agentic AI Top 10, and real-world attack case studies, learning how attackers exploit every component of the AI stack from prompts through agents to infrastructure.
Chapter 3 built your defenses – you learned the 6-layer Blueprint for infrastructure protection, the LEARN Architecture for application-level security, the AI Scanner/Guard continuous loop for operational defense, and the organizational practices that make technical controls effective.

The three chapters form a complete arc: understand the technology, recognize the threats, build the defenses. AI security is not a destination – it’s an ongoing discipline that requires continuous learning, testing, and adaptation. The frameworks, tools, and practices in this course give you the foundation to build and maintain secure AI systems in a landscape where both capabilities and threats evolve rapidly.

Key Takeaways

AI red-teaming uses structured adversarial testing based on OWASP attack categories to discover vulnerabilities before attackers do and validate that Blueprint and LEARN defenses hold under realistic conditions
AI-specific incident response requires adapted containment strategies for model compromise, data poisoning, and agent hijacking – including model rollback, corpus quarantine, and credential revocation
Regulatory frameworks (NIST AI RMF and EU AI Act) provide governance structures and risk-based classification that shape how organizations implement and document their AI security controls
Organizational practices – security champions, training programs, acceptable use policies, and ethics integration – are the foundation that makes technical defenses effective
The three-chapter arc provides a complete AI security education: understand the technology (Chapter 1), recognize the threats (Chapter 2), build the defenses (Chapter 3)

Test Your Knowledge

Ready to test your understanding of AI security culture and governance? Head to the quiz to check your knowledge.

Course Complete!

Congratulations on completing all three chapters of the AI security course. You now have the knowledge to understand AI systems, recognize how they’re attacked, and build the defenses that protect them. The frameworks and practices you’ve learned – from OWASP categories to Blueprint layers to LEARN components – provide a comprehensive foundation for securing AI in your organization.

Previous Section Back to Top Course Home