5. Layer 3: Secure Your AI Infrastructure

Introduction

AI infrastructure is not just “servers that run models.” It is a uniquely complex ecosystem of GPU clusters, model serving endpoints, vector databases, orchestration layers, API gateways, and monitoring systems – each with its own attack surface, and all interconnected in ways that traditional infrastructure security was never designed to handle.

In Chapter 2, you saw how infrastructure attacks exploit container escapes, GPU memory vulnerabilities, and API security gaps. You saw how tool misuse turns legitimate AI orchestration capabilities into attack vectors. And you saw how identity and privilege abuse chains together legitimate permissions into full system compromise. Layer 3 of the Security for AI Blueprint addresses these threats through a new discipline: AI Security Posture Management.

This section takes a deep dive into Layer 3 – the layer that gives security teams continuous visibility into their AI infrastructure, identifies misconfigurations before attackers exploit them, and ensures that every component in the AI stack meets security baselines. Because AI infrastructure changes rapidly (new models deployed, new tools integrated, new endpoints exposed), traditional point-in-time security assessments are not enough. Layer 3 demands continuous, automated posture management.

What will I get out of this?

By the end of this section, you will be able to:

Explain what AI Security Posture Management (AI-SPM) is and how it differs from traditional Cloud Security Posture Management (CSPM).
Describe the key components of AI-SPM including asset discovery, configuration assessment, risk scoring, and remediation guidance.
Identify posture management requirements for GPU clusters, model serving endpoints, and cloud AI services.
Explain how risk insights prioritize remediation by correlating findings across the AI stack.
Describe orchestration layer security for tools like MCP servers, function calling, and agent frameworks.
Design identity and access management strategies specific to AI service accounts and API credentials.
Map Layer 3 controls to specific OWASP categories including LLM10, ASI02, ASI03, and ASI05.

AI Security Posture Management (AI-SPM)

Traditional CSPM (Cloud Security Posture Management) monitors cloud resources for misconfigurations: open S3 buckets, overly permissive security groups, unencrypted databases. AI-SPM extends this concept to the unique resources that make up an AI deployment – and adds discovery capabilities that are critical because many organizations don’t even know the full extent of their AI footprint.

Why AI Needs Its Own Posture Management

Standard CSPM tools see a GPU instance as just another virtual machine. They don’t understand that it’s running a model serving endpoint that exposes an unauthenticated inference API. They don’t know that the vector database behind it has no access controls. They can’t assess whether the model loaded on the GPU has been scanned for vulnerabilities.

AI-SPM understands the AI context:

AI asset discovery: Finds all AI-related resources across the organization – model endpoints, vector databases, training pipelines, GPU clusters, and shadow AI deployments that security teams didn’t know existed
AI-specific configuration baselines: Assesses resources against security baselines designed for AI workloads, not just generic cloud benchmarks
Model-aware risk scoring: Scores risk based on what the AI system does (handles PII? makes financial decisions? has tool access?) – not just on infrastructure metrics
Drift detection: Continuously monitors for configuration changes that weaken security posture – a new model endpoint deployed without authentication, a vector database permission change, a GPU cluster exposed to a wider network

The AI-SPM Workflow

graph TB
    DIS["<b>Discover</b><br/><small>Inventory all AI assets:<br/>model endpoints, GPU clusters,<br/>vector DBs, orchestration tools,<br/>shadow AI services</small>"]
    ASS["<b>Assess</b><br/><small>Evaluate configuration<br/>against AI security<br/>baselines: authentication,<br/>encryption, access controls</small>"]
    SCR["<b>Score</b><br/><small>Risk scoring based on<br/>exposure, data sensitivity,<br/>tool access, and<br/>blast radius</small>"]
    REM["<b>Remediate</b><br/><small>Prioritized guidance:<br/>what to fix first,<br/>how to fix it,<br/>expected risk reduction</small>"]

    DIS --> ASS --> SCR --> REM
    REM -.->|"Continuous<br/>reassessment"| DIS

    SA1["Model Endpoints"]
    SA2["GPU Clusters"]
    SA3["Vector Databases"]
    SA4["Orchestration Tools"]
    SA5["Shadow AI Services"]

    SA1 -.-> DIS
    SA2 -.-> DIS
    SA3 -.-> DIS
    SA4 -.-> DIS
    SA5 -.-> DIS

    style DIS fill:#2d5016,color:#fff
    style ASS fill:#2d5016,color:#fff
    style SCR fill:#2d5016,color:#fff
    style REM fill:#2d5016,color:#fff
    style SA1 fill:#1C90F3,color:#fff
    style SA2 fill:#1C90F3,color:#fff
    style SA3 fill:#1C90F3,color:#fff
    style SA4 fill:#1C90F3,color:#fff
    style SA5 fill:#1C90F3,color:#fff

The workflow is continuous. New AI assets can appear at any time – a developer spins up a model endpoint for testing, a team integrates a new MCP server, an internal tool starts calling an external AI API. AI-SPM must detect these additions and assess them against security baselines without waiting for a manual review cycle.

Defense Connection

AI-SPM directly addresses LLM10: Unbounded Consumption by detecting misconfigured model endpoints that lack rate limiting, authentication, or cost controls. The infrastructure attacks from Chapter 2 showed how unprotected API endpoints and missing rate limits enable denial of service and financial damage. AI-SPM’s continuous assessment ensures these misconfigurations are caught and flagged before attackers discover them.

Posture Management for AI Resources

Different AI resource types need different security baselines. A GPU cluster requires different controls than a vector database, and a model serving endpoint has different requirements than an orchestration tool.

GPU Cluster Security

GPU clusters are the compute backbone of AI training and inference. They’re expensive, powerful, and frequently shared across teams – which creates multi-tenancy risks.

Control	What It Addresses	Implementation
Network segmentation	Isolate GPU clusters from general enterprise network	Dedicated VPCs/subnets for AI workloads; explicit firewall rules for ingress/egress
Memory isolation	Prevent cross-tenant GPU memory leakage	GPU partitioning (MIG for NVIDIA A100/H100); memory clearing between workloads
Access controls	Prevent unauthorized access to compute resources	RBAC on cluster management; API key rotation for programmatic access
Audit logging	Track who runs what on the cluster	Job submission logs, resource usage tracking, model loading records
Cost monitoring	Detect anomalous resource consumption	Usage alerts, budget caps, per-team quotas

Model Serving Endpoint Hardening

Every model serving endpoint is an API – and every API needs baseline security:

Authentication required: No unauthenticated model endpoints in production (this seems obvious but is one of the most common AI-SPM findings)
TLS encryption: All model API traffic encrypted in transit
Rate limiting: Per-user and per-endpoint rate limits to prevent unbounded consumption
Input validation: Maximum token/request size limits enforced at the endpoint level
Health monitoring: Automated checks for model availability, latency, and error rates

Cloud AI Service Considerations

Organizations using managed AI services (AWS SageMaker, Azure AI, GCP Vertex AI) benefit from the provider’s infrastructure security but still need posture management for their configurations:

IAM policies: Are service roles following least-privilege? Do they have access to resources they don’t need?
Network configuration: Are endpoints exposed to the internet when they should be VPC-internal?
Logging and monitoring: Are CloudTrail/Activity Log/Cloud Audit logs enabled for AI service operations?
Data residency: Are model training jobs and inference endpoints in approved regions for compliance?
Model registry access: Who can deploy new models to managed endpoints? Are there approval gates?

AI-SPM vs. Traditional CSPM: Key Differences

Dimension	Traditional CSPM	AI-SPM
Asset types	VMs, databases, storage buckets, network resources	Model endpoints, GPU clusters, vector DBs, orchestration tools, agent frameworks
Configuration baselines	CIS benchmarks, cloud provider best practices	AI-specific baselines: model authentication, inference endpoint hardening, tool access policies
Risk context	Data sensitivity, compliance requirements	Data sensitivity + model capabilities + tool access + agent autonomy level
Discovery scope	Cloud resources in managed accounts	Cloud resources + shadow AI services + third-party AI APIs + MCP servers
Drift detection	Configuration changes to cloud resources	Configuration changes + new model deployments + new tool integrations + permission changes on AI service accounts

Understanding these differences is critical because organizations that rely solely on traditional CSPM for AI workloads will have significant blind spots – they’ll see the cloud resources but not the AI-specific risks those resources create.

Risk Insights and Prioritization

Not all security findings are equal. An unauthenticated model endpoint that handles PII is far more critical than a missing access log on an internal test cluster. AI-SPM’s risk insights correlate findings across the AI stack to help security teams prioritize what to fix first.

Cyber Risk Exposure Management for AI

Risk scoring for AI infrastructure should factor in:

Data sensitivity: What data does the AI system access or process? PII, financial data, and health records increase risk scores.
Tool access: Does the AI system have access to tools that can take real-world actions (send emails, modify databases, execute code)? Tool access dramatically increases blast radius.
Exposure surface: Is the AI endpoint internal-only or internet-facing? Are there multiple paths to reach it?
Regulatory context: Is the AI system subject to specific regulations (GDPR, HIPAA, EU AI Act)? Compliance requirements elevate the priority of related findings.
Blast radius: If this AI system is compromised, how many other systems, users, and data assets are affected?

Prioritization Framework

Risk Level	Criteria	Example	Response Time
CRITICAL	Internet-facing + handles PII + has tool access	Customer-facing AI chatbot with database access and no rate limiting	Immediate (hours)
HIGH	Internet-facing OR handles sensitive data + misconfiguration	Model endpoint with weak authentication, processing financial queries	Within 24 hours
MEDIUM	Internal-only + misconfiguration	Internal GPU cluster with overly broad network access	Within 1 week
LOW	Internal-only + minor deviation from baseline	Test endpoint missing audit logging	Within 1 month

Defense Connection

Risk prioritization correlates with multiple OWASP categories. An AI system with excessive tool access maps to WarningASI02: Tool Misuse and Exploitation from Chapter 2 Section 5. An AI system with overly broad permissions maps to WarningASI03: Identity and Privilege Abuse. AI-SPM’s risk scoring identifies where these specific risks are concentrated in your infrastructure.

Securing the AI Orchestration Layer

The orchestration layer – the tooling that connects models to actions – is one of the fastest-growing and least-secured components of AI infrastructure. MCP servers, function calling frameworks, agent orchestrators like n8n and LangChain, and custom tool integrations all sit in this layer.

Why Orchestration Security Matters

Orchestration tools are the bridge between “AI that generates text” and “AI that takes actions.” They are where the model’s output becomes real-world impact. A compromised orchestration layer means an attacker can:

Redirect tool calls: Route agent actions to attacker-controlled endpoints
Modify tool outputs: Inject poisoned data into tool responses that the agent trusts
Expand permissions: Leverage orchestration-level access to reach systems the agent shouldn’t access
Persist across sessions: Install backdoors in orchestration configurations that survive agent restarts

Orchestration Security Controls

graph TB
    REQ["Agent Request<br/><small>Tool call from<br/>LLM or agent</small>"]
    AUTH["Authenticate<br/><small>Verify agent<br/>identity and<br/>credentials</small>"]
    ALLOW["Allowlist<br/>Check<br/><small>Tool on approved<br/>list for this agent?</small>"]
    SANDBOX["Sandbox<br/>Execution<br/><small>Isolated environment,<br/>restricted permissions</small>"]
    VALIDATE["Output<br/>Validation<br/><small>Check for injection,<br/>anomalous data</small>"]
    RESP["Validated<br/>Response<br/><small>Safe output<br/>returned to agent</small>"]
    BLOCK["Blocked<br/><small>Unauthorized or<br/>anomalous call</small>"]

    REQ --> AUTH
    AUTH -->|"Verified"| ALLOW
    AUTH -->|"Failed"| BLOCK
    ALLOW -->|"Approved"| SANDBOX
    ALLOW -->|"Denied"| BLOCK
    SANDBOX --> VALIDATE
    VALIDATE -->|"Clean"| RESP
    VALIDATE -->|"Injection<br/>detected"| BLOCK

    style REQ fill:#1C90F3,color:#fff
    style AUTH fill:#2d5016,color:#fff
    style ALLOW fill:#2d5016,color:#fff
    style SANDBOX fill:#2d5016,color:#fff
    style VALIDATE fill:#2d5016,color:#fff
    style RESP fill:#1C90F3,color:#fff
    style BLOCK fill:#8b0000,color:#fff

Every agent tool call passes through this security pipeline: identity verification, allowlist enforcement, sandboxed execution, and output validation. Failure at any checkpoint blocks the call and generates an alert. This pipeline prevents unauthorized tool access, tool output injection, and sandbox escapes.

MCP Server Verification: Before connecting any MCP server to an agent framework, verify its source, inspect its code, and understand exactly what capabilities it provides. The Cursor MCP exploitation from Chapter 2 demonstrated how a malicious MCP server can inject hidden instructions into tool responses.

Tool Allowlisting: Maintain an explicit list of approved tools for each agent. Agents should only be able to call tools that are on their allowlist – not discover and use arbitrary tools at runtime.

Execution Monitoring: Log every tool call, including the inputs sent and outputs received. Monitor for anomalous patterns: unexpected tools being called, unusual parameter values, tool calls to unknown endpoints.

Sandboxing: Where possible, run orchestration tools in isolated environments with limited network access and restricted file system permissions. The orchestration layer should not have direct access to production databases, credential stores, or administrative APIs.

Defense Connection

Orchestration security directly addresses WarningASI02: Tool Misuse and Exploitation and WarningASI05: Unexpected Code Execution from Chapter 2 Section 5. The agentic attack vectors you studied – tool output injection, MCP server exploitation, sandboxing escapes – all target the orchestration layer. Securing this layer removes the mechanisms that enable these attacks.

Identity and Access Management for AI

AI systems need identities. A model serving endpoint needs credentials to access its model weights. An agent needs API keys to call external services. An orchestration tool needs database credentials to execute queries. Managing these identities is a critical – and frequently neglected – aspect of AI infrastructure security.

The Least-Privilege Challenge

AI systems often accumulate permissions over time. A development team gives an agent broad access during prototyping (“just let it access everything so we can test faster”), and those permissions are never tightened for production. The result is AI systems with far more access than they need – exactly the condition that enables the privilege escalation chains from Chapter 2 Section 5.

IAM Controls for AI

Control	Purpose	Implementation
Service account per function	Separate identities for separate capabilities	One service account for model inference, another for database access, another for tool execution
Scoped API keys	Limit what each credential can do	API keys with specific permission scopes; no master keys in agent environments
Credential rotation	Limit the window of opportunity for stolen credentials	Automated rotation on a regular schedule; immediate rotation when compromise is suspected
Just-in-time access	Grant permissions only when needed, revoke after use	Temporary credentials for specific tasks; time-bounded access tokens
Audit trail	Track all credential usage	Log every API call with the credential used; alert on anomalous usage patterns

Defense Connection

Identity and access management for AI directly defends against WarningASI03: Identity and Privilege Abuse. The privilege escalation chain from Chapter 2 – where an agent used read-only file access to discover database credentials, which led to admin API keys, which led to full system compromise – is broken by least-privilege enforcement. If the agent’s file access credential can only read specific directories, and the database credential can only execute specific queries, the chain cannot form.

Defense Perspective: Cursor MCP Exploitation

The attack (from Chapter 2 Section 5): Two CVEs in the Cursor AI coding assistant (CVE-2025-54135, CVE-2025-54136) demonstrated how malicious MCP servers could exploit the agent supply chain. A developer installs what appears to be a legitimate MCP server from npm. The server responds to tool calls with outputs containing hidden prompt injection. The AI agent processes these instructions and executes attacker-supplied code with the developer’s full permissions – accessing file systems, Git repositories, and credentials.

What Layer 3 controls would have prevented or mitigated:

AI-SPM discovery: AI-SPM’s continuous discovery would have identified the unverified MCP server as a new, unassessed component in the AI infrastructure. It would have flagged the server as an untrusted tool provider requiring security review before integration.
Orchestration security controls: A tool allowlist would have required explicit approval before the agent could use tools provided by the new MCP server. Execution monitoring would have detected the anomalous code execution triggered by tool responses.
Container security (Layer 2 + Layer 3 overlap): If the MCP server was running in a scanned container, Container Security would have identified the hidden functionality during image analysis – the exfiltration code and encoded injection payloads would appear as suspicious behaviors.
Identity management: Least-privilege for the AI coding assistant would have limited the blast radius. If the agent’s service account only had access to the current project directory (not the entire file system, not SSH keys, not Git credentials), the exfiltration would have been limited to the current project context.

The key insight: MCP servers are infrastructure. They need the same vetting, monitoring, and access controls as any other infrastructure component in the AI stack. Treating them as “just plugins” creates the exact supply chain vulnerability that attackers exploit.

AI Scanner Cross-Reference

AI Scanner’s infrastructure assessment capabilities complement AI-SPM by identifying model-level vulnerabilities that infrastructure monitoring alone wouldn’t catch. While AI-SPM detects misconfigurations in the infrastructure running AI systems, AI Scanner detects vulnerabilities in the models themselves – prompt injection susceptibility, system prompt leakage risk, and adversarial robustness gaps. Together, they provide coverage across both the infrastructure layer and the model layer. See Section 9 for how these tools integrate in the continuous protection loop.

Vision One’s AI-SPM dashboard provides continuous visibility into an organization’s AI assets – discovering shadow AI deployments, assessing configuration against security baselines, and scoring risk across model serving infrastructure. The Cyber Risk Exposure Management module correlates AI-specific risks with broader enterprise risk posture, enabling security teams to prioritize remediation based on actual exposure rather than theoretical severity. By integrating AI-SPM into the same platform that manages the other five Blueprint layers, organizations gain a single view of their AI security posture alongside their traditional infrastructure posture.

AI Infrastructure Security Decision Framework

Use this decision framework to determine the right security approach for your AI infrastructure components:

When to use AI-SPM vs. traditional CSPM:

Question	If Yes	If No
Does the resource serve or process AI models?	AI-SPM	Traditional CSPM may suffice
Does the resource store AI-specific data (embeddings, model weights, conversation logs)?	AI-SPM	Traditional CSPM
Does the resource connect models to tools or external services (orchestration)?	AI-SPM	Traditional CSPM
Is the resource a standard cloud service used by AI workloads (compute, storage, networking)?	Both – CSPM for cloud config, AI-SPM for AI context	Traditional CSPM

How to prioritize AI infrastructure hardening:

Start with internet-facing endpoints: Any model endpoint, AI API, or chatbot interface accessible from the internet gets hardened first
Then tool-connected agents: Any AI system with tool access (code execution, database queries, email sending) gets identity management and orchestration security next
Then internal infrastructure: GPU clusters, vector databases, training pipelines – apply segmentation, access controls, and monitoring
Then development/test environments: Even non-production AI resources need baseline security to prevent lateral movement from dev to prod

Red flags that indicate immediate action:

Model endpoint with no authentication
AI agent with admin-level service account
MCP server from unverified source connected to production agent
GPU cluster accessible from general enterprise network
API keys for AI services stored in plaintext configuration files
Shadow AI service discovered that security team didn’t know about

Key Takeaways

AI Security Posture Management (AI-SPM) extends traditional CSPM by discovering AI-specific assets, assessing against AI security baselines, and scoring risk based on model capabilities and tool access
Posture management requirements differ across AI resource types: GPU clusters need memory isolation and network segmentation, model endpoints need authentication and rate limiting, and cloud AI services need IAM least-privilege enforcement
Orchestration layer security (MCP servers, function calling, agent frameworks) requires server verification, tool allowlisting, execution monitoring, and sandboxing to prevent tool output injection and lateral movement
AI service accounts require least-privilege identity management with separate credentials per function, scoped API keys, credential rotation, and just-in-time access to break privilege escalation chains

Test Your Knowledge

Ready to test your understanding of AI infrastructure security? Head to the quiz to check your knowledge.

Up next

Infrastructure secured, the next layer addresses the human element. In Section 6, you’ll learn about Layer 4: Secure Your Users – including deepfake detection, endpoint security for AI-era threats, shadow AI governance, and how to protect users from AI-powered social engineering.

Previous Section Back to Top Next Section