5. Layer 3: Secure Your AI Infrastructure
Introduction
AI infrastructure is not just “servers that run models.” It is a uniquely complex ecosystem of GPU clusters, model serving endpoints, vector databases, orchestration layers, API gateways, and monitoring systems – each with its own attack surface, and all interconnected in ways that traditional infrastructure security was never designed to handle.
In Chapter 2, you saw how infrastructure attacks exploit container escapes, GPU memory vulnerabilities, and API security gaps. You saw how tool misuse turns legitimate AI orchestration capabilities into attack vectors. And you saw how identity and privilege abuse chains together legitimate permissions into full system compromise. Layer 3 of the Security for AI Blueprint addresses these threats through a new discipline: AI Security Posture Management.
This section takes a deep dive into Layer 3 – the layer that gives security teams continuous visibility into their AI infrastructure, identifies misconfigurations before attackers exploit them, and ensures that every component in the AI stack meets security baselines. Because AI infrastructure changes rapidly (new models deployed, new tools integrated, new endpoints exposed), traditional point-in-time security assessments are not enough. Layer 3 demands continuous, automated posture management.
What will I get out of this?
By the end of this section, you will be able to:
- Explain what AI Security Posture Management (AI-SPM) is and how it differs from traditional Cloud Security Posture Management (CSPM).
- Describe the key components of AI-SPM including asset discovery, configuration assessment, risk scoring, and remediation guidance.
- Identify posture management requirements for GPU clusters, model serving endpoints, and cloud AI services.
- Explain how risk insights prioritize remediation by correlating findings across the AI stack.
- Describe orchestration layer security for tools like MCP servers, function calling, and agent frameworks.
- Design identity and access management strategies specific to AI service accounts and API credentials.
- Map Layer 3 controls to specific OWASP categories including LLM10, ASI02, ASI03, and ASI05.
AI Security Posture Management (AI-SPM)
Traditional CSPM (Cloud Security Posture Management) monitors cloud resources for misconfigurations: open S3 buckets, overly permissive security groups, unencrypted databases. AI-SPM extends this concept to the unique resources that make up an AI deployment – and adds discovery capabilities that are critical because many organizations don’t even know the full extent of their AI footprint.
Why AI Needs Its Own Posture Management
Standard CSPM tools see a GPU instance as just another virtual machine. They don’t understand that it’s running a model serving endpoint that exposes an unauthenticated inference API. They don’t know that the vector database behind it has no access controls. They can’t assess whether the model loaded on the GPU has been scanned for vulnerabilities.
AI-SPM understands the AI context:
- AI asset discovery: Finds all AI-related resources across the organization – model endpoints, vector databases, training pipelines, GPU clusters, and shadow AI deployments that security teams didn’t know existed
- AI-specific configuration baselines: Assesses resources against security baselines designed for AI workloads, not just generic cloud benchmarks
- Model-aware risk scoring: Scores risk based on what the AI system does (handles PII? makes financial decisions? has tool access?) – not just on infrastructure metrics
- Drift detection: Continuously monitors for configuration changes that weaken security posture – a new model endpoint deployed without authentication, a vector database permission change, a GPU cluster exposed to a wider network
The AI-SPM Workflow
graph TB
DIS["<b>Discover</b><br/><small>Inventory all AI assets:<br/>model endpoints, GPU clusters,<br/>vector DBs, orchestration tools,<br/>shadow AI services</small>"]
ASS["<b>Assess</b><br/><small>Evaluate configuration<br/>against AI security<br/>baselines: authentication,<br/>encryption, access controls</small>"]
SCR["<b>Score</b><br/><small>Risk scoring based on<br/>exposure, data sensitivity,<br/>tool access, and<br/>blast radius</small>"]
REM["<b>Remediate</b><br/><small>Prioritized guidance:<br/>what to fix first,<br/>how to fix it,<br/>expected risk reduction</small>"]
DIS --> ASS --> SCR --> REM
REM -.->|"Continuous<br/>reassessment"| DIS
SA1["Model Endpoints"]
SA2["GPU Clusters"]
SA3["Vector Databases"]
SA4["Orchestration Tools"]
SA5["Shadow AI Services"]
SA1 -.-> DIS
SA2 -.-> DIS
SA3 -.-> DIS
SA4 -.-> DIS
SA5 -.-> DIS
style DIS fill:#2d5016,color:#fff
style ASS fill:#2d5016,color:#fff
style SCR fill:#2d5016,color:#fff
style REM fill:#2d5016,color:#fff
style SA1 fill:#1C90F3,color:#fff
style SA2 fill:#1C90F3,color:#fff
style SA3 fill:#1C90F3,color:#fff
style SA4 fill:#1C90F3,color:#fff
style SA5 fill:#1C90F3,color:#fff
The workflow is continuous. New AI assets can appear at any time – a developer spins up a model endpoint for testing, a team integrates a new MCP server, an internal tool starts calling an external AI API. AI-SPM must detect these additions and assess them against security baselines without waiting for a manual review cycle.
Defense Connection
AI-SPM directly addresses LLM10: Unbounded Consumption by detecting misconfigured model endpoints that lack rate limiting, authentication, or cost controls. The infrastructure attacks from Chapter 2 showed how unprotected API endpoints and missing rate limits enable denial of service and financial damage. AI-SPM’s continuous assessment ensures these misconfigurations are caught and flagged before attackers discover them.
Posture Management for AI Resources
Different AI resource types need different security baselines. A GPU cluster requires different controls than a vector database, and a model serving endpoint has different requirements than an orchestration tool.
GPU Cluster Security
GPU clusters are the compute backbone of AI training and inference. They’re expensive, powerful, and frequently shared across teams – which creates multi-tenancy risks.
| Control | What It Addresses | Implementation |
|---|---|---|
| Network segmentation | Isolate GPU clusters from general enterprise network | Dedicated VPCs/subnets for AI workloads; explicit firewall rules for ingress/egress |
| Memory isolation | Prevent cross-tenant GPU memory leakage | GPU partitioning (MIG for NVIDIA A100/H100); memory clearing between workloads |
| Access controls | Prevent unauthorized access to compute resources | RBAC on cluster management; API key rotation for programmatic access |
| Audit logging | Track who runs what on the cluster | Job submission logs, resource usage tracking, model loading records |
| Cost monitoring | Detect anomalous resource consumption | Usage alerts, budget caps, per-team quotas |
Model Serving Endpoint Hardening
Every model serving endpoint is an API – and every API needs baseline security:
- Authentication required: No unauthenticated model endpoints in production (this seems obvious but is one of the most common AI-SPM findings)
- TLS encryption: All model API traffic encrypted in transit
- Rate limiting: Per-user and per-endpoint rate limits to prevent unbounded consumption
- Input validation: Maximum token/request size limits enforced at the endpoint level
- Health monitoring: Automated checks for model availability, latency, and error rates
Cloud AI Service Considerations
Organizations using managed AI services (AWS SageMaker, Azure AI, GCP Vertex AI) benefit from the provider’s infrastructure security but still need posture management for their configurations:
- IAM policies: Are service roles following least-privilege? Do they have access to resources they don’t need?
- Network configuration: Are endpoints exposed to the internet when they should be VPC-internal?
- Logging and monitoring: Are CloudTrail/Activity Log/Cloud Audit logs enabled for AI service operations?
- Data residency: Are model training jobs and inference endpoints in approved regions for compliance?
- Model registry access: Who can deploy new models to managed endpoints? Are there approval gates?
AI-SPM vs. Traditional CSPM: Key Differences
| Dimension | Traditional CSPM | AI-SPM |
|---|---|---|
| Asset types | VMs, databases, storage buckets, network resources | Model endpoints, GPU clusters, vector DBs, orchestration tools, agent frameworks |
| Configuration baselines | CIS benchmarks, cloud provider best practices | AI-specific baselines: model authentication, inference endpoint hardening, tool access policies |
| Risk context | Data sensitivity, compliance requirements | Data sensitivity + model capabilities + tool access + agent autonomy level |
| Discovery scope | Cloud resources in managed accounts | Cloud resources + shadow AI services + third-party AI APIs + MCP servers |
| Drift detection | Configuration changes to cloud resources | Configuration changes + new model deployments + new tool integrations + permission changes on AI service accounts |
Understanding these differences is critical because organizations that rely solely on traditional CSPM for AI workloads will have significant blind spots – they’ll see the cloud resources but not the AI-specific risks those resources create.
Risk Insights and Prioritization
Not all security findings are equal. An unauthenticated model endpoint that handles PII is far more critical than a missing access log on an internal test cluster. AI-SPM’s risk insights correlate findings across the AI stack to help security teams prioritize what to fix first.
Cyber Risk Exposure Management for AI
Risk scoring for AI infrastructure should factor in:
- Data sensitivity: What data does the AI system access or process? PII, financial data, and health records increase risk scores.
- Tool access: Does the AI system have access to tools that can take real-world actions (send emails, modify databases, execute code)? Tool access dramatically increases blast radius.
- Exposure surface: Is the AI endpoint internal-only or internet-facing? Are there multiple paths to reach it?
- Regulatory context: Is the AI system subject to specific regulations (GDPR, HIPAA, EU AI Act)? Compliance requirements elevate the priority of related findings.
- Blast radius: If this AI system is compromised, how many other systems, users, and data assets are affected?
Prioritization Framework
| Risk Level | Criteria | Example | Response Time |
|---|---|---|---|
| CRITICAL | Internet-facing + handles PII + has tool access | Customer-facing AI chatbot with database access and no rate limiting | Immediate (hours) |
| HIGH | Internet-facing OR handles sensitive data + misconfiguration | Model endpoint with weak authentication, processing financial queries | Within 24 hours |
| MEDIUM | Internal-only + misconfiguration | Internal GPU cluster with overly broad network access | Within 1 week |
| LOW | Internal-only + minor deviation from baseline | Test endpoint missing audit logging | Within 1 month |
Defense Connection
Risk prioritization correlates with multiple OWASP categories. An AI system with excessive tool access maps to WarningASI02: Tool Misuse and Exploitation from Chapter 2 Section 5. An AI system with overly broad permissions maps to WarningASI03: Identity and Privilege Abuse. AI-SPM’s risk scoring identifies where these specific risks are concentrated in your infrastructure.
Securing the AI Orchestration Layer
The orchestration layer – the tooling that connects models to actions – is one of the fastest-growing and least-secured components of AI infrastructure. MCP servers, function calling frameworks, agent orchestrators like n8n and LangChain, and custom tool integrations all sit in this layer.
Why Orchestration Security Matters
Orchestration tools are the bridge between “AI that generates text” and “AI that takes actions.” They are where the model’s output becomes real-world impact. A compromised orchestration layer means an attacker can:
- Redirect tool calls: Route agent actions to attacker-controlled endpoints
- Modify tool outputs: Inject poisoned data into tool responses that the agent trusts
- Expand permissions: Leverage orchestration-level access to reach systems the agent shouldn’t access
- Persist across sessions: Install backdoors in orchestration configurations that survive agent restarts
Orchestration Security Controls
graph TB
REQ["Agent Request<br/><small>Tool call from<br/>LLM or agent</small>"]
AUTH["Authenticate<br/><small>Verify agent<br/>identity and<br/>credentials</small>"]
ALLOW["Allowlist<br/>Check<br/><small>Tool on approved<br/>list for this agent?</small>"]
SANDBOX["Sandbox<br/>Execution<br/><small>Isolated environment,<br/>restricted permissions</small>"]
VALIDATE["Output<br/>Validation<br/><small>Check for injection,<br/>anomalous data</small>"]
RESP["Validated<br/>Response<br/><small>Safe output<br/>returned to agent</small>"]
BLOCK["Blocked<br/><small>Unauthorized or<br/>anomalous call</small>"]
REQ --> AUTH
AUTH -->|"Verified"| ALLOW
AUTH -->|"Failed"| BLOCK
ALLOW -->|"Approved"| SANDBOX
ALLOW -->|"Denied"| BLOCK
SANDBOX --> VALIDATE
VALIDATE -->|"Clean"| RESP
VALIDATE -->|"Injection<br/>detected"| BLOCK
style REQ fill:#1C90F3,color:#fff
style AUTH fill:#2d5016,color:#fff
style ALLOW fill:#2d5016,color:#fff
style SANDBOX fill:#2d5016,color:#fff
style VALIDATE fill:#2d5016,color:#fff
style RESP fill:#1C90F3,color:#fff
style BLOCK fill:#8b0000,color:#fff
Every agent tool call passes through this security pipeline: identity verification, allowlist enforcement, sandboxed execution, and output validation. Failure at any checkpoint blocks the call and generates an alert. This pipeline prevents unauthorized tool access, tool output injection, and sandbox escapes.
MCP Server Verification: Before connecting any MCP server to an agent framework, verify its source, inspect its code, and understand exactly what capabilities it provides. The Cursor MCP exploitation from Chapter 2 demonstrated how a malicious MCP server can inject hidden instructions into tool responses.
Tool Allowlisting: Maintain an explicit list of approved tools for each agent. Agents should only be able to call tools that are on their allowlist – not discover and use arbitrary tools at runtime.
Execution Monitoring: Log every tool call, including the inputs sent and outputs received. Monitor for anomalous patterns: unexpected tools being called, unusual parameter values, tool calls to unknown endpoints.
Sandboxing: Where possible, run orchestration tools in isolated environments with limited network access and restricted file system permissions. The orchestration layer should not have direct access to production databases, credential stores, or administrative APIs.
Defense Connection
Orchestration security directly addresses WarningASI02: Tool Misuse and Exploitation and WarningASI05: Unexpected Code Execution from Chapter 2 Section 5. The agentic attack vectors you studied – tool output injection, MCP server exploitation, sandboxing escapes – all target the orchestration layer. Securing this layer removes the mechanisms that enable these attacks.
Identity and Access Management for AI
AI systems need identities. A model serving endpoint needs credentials to access its model weights. An agent needs API keys to call external services. An orchestration tool needs database credentials to execute queries. Managing these identities is a critical – and frequently neglected – aspect of AI infrastructure security.
The Least-Privilege Challenge
AI systems often accumulate permissions over time. A development team gives an agent broad access during prototyping (“just let it access everything so we can test faster”), and those permissions are never tightened for production. The result is AI systems with far more access than they need – exactly the condition that enables the privilege escalation chains from Chapter 2 Section 5.
IAM Controls for AI
| Control | Purpose | Implementation |
|---|---|---|
| Service account per function | Separate identities for separate capabilities | One service account for model inference, another for database access, another for tool execution |
| Scoped API keys | Limit what each credential can do | API keys with specific permission scopes; no master keys in agent environments |
| Credential rotation | Limit the window of opportunity for stolen credentials | Automated rotation on a regular schedule; immediate rotation when compromise is suspected |
| Just-in-time access | Grant permissions only when needed, revoke after use | Temporary credentials for specific tasks; time-bounded access tokens |
| Audit trail | Track all credential usage | Log every API call with the credential used; alert on anomalous usage patterns |
Defense Connection
Identity and access management for AI directly defends against WarningASI03: Identity and Privilege Abuse. The privilege escalation chain from Chapter 2 – where an agent used read-only file access to discover database credentials, which led to admin API keys, which led to full system compromise – is broken by least-privilege enforcement. If the agent’s file access credential can only read specific directories, and the database credential can only execute specific queries, the chain cannot form.
Defense Perspective: Cursor MCP Exploitation
The attack (from Chapter 2 Section 5): Two CVEs in the Cursor AI coding assistant (CVE-2025-54135, CVE-2025-54136) demonstrated how malicious MCP servers could exploit the agent supply chain. A developer installs what appears to be a legitimate MCP server from npm. The server responds to tool calls with outputs containing hidden prompt injection. The AI agent processes these instructions and executes attacker-supplied code with the developer’s full permissions – accessing file systems, Git repositories, and credentials.
What Layer 3 controls would have prevented or mitigated:
-
AI-SPM discovery: AI-SPM’s continuous discovery would have identified the unverified MCP server as a new, unassessed component in the AI infrastructure. It would have flagged the server as an untrusted tool provider requiring security review before integration.
-
Orchestration security controls: A tool allowlist would have required explicit approval before the agent could use tools provided by the new MCP server. Execution monitoring would have detected the anomalous code execution triggered by tool responses.
-
Container security (Layer 2 + Layer 3 overlap): If the MCP server was running in a scanned container, Container Security would have identified the hidden functionality during image analysis – the exfiltration code and encoded injection payloads would appear as suspicious behaviors.
-
Identity management: Least-privilege for the AI coding assistant would have limited the blast radius. If the agent’s service account only had access to the current project directory (not the entire file system, not SSH keys, not Git credentials), the exfiltration would have been limited to the current project context.
The key insight: MCP servers are infrastructure. They need the same vetting, monitoring, and access controls as any other infrastructure component in the AI stack. Treating them as “just plugins” creates the exact supply chain vulnerability that attackers exploit.
AI Scanner Cross-Reference
AI Scanner’s infrastructure assessment capabilities complement AI-SPM by identifying model-level vulnerabilities that infrastructure monitoring alone wouldn’t catch. While AI-SPM detects misconfigurations in the infrastructure running AI systems, AI Scanner detects vulnerabilities in the models themselves – prompt injection susceptibility, system prompt leakage risk, and adversarial robustness gaps. Together, they provide coverage across both the infrastructure layer and the model layer. See Section 9 for how these tools integrate in the continuous protection loop.
Vision One’s AI-SPM dashboard provides continuous visibility into an organization’s AI assets – discovering shadow AI deployments, assessing configuration against security baselines, and scoring risk across model serving infrastructure. The Cyber Risk Exposure Management module correlates AI-specific risks with broader enterprise risk posture, enabling security teams to prioritize remediation based on actual exposure rather than theoretical severity. By integrating AI-SPM into the same platform that manages the other five Blueprint layers, organizations gain a single view of their AI security posture alongside their traditional infrastructure posture.
Key Takeaways
- AI Security Posture Management (AI-SPM) extends traditional CSPM by discovering AI-specific assets, assessing against AI security baselines, and scoring risk based on model capabilities and tool access
- Posture management requirements differ across AI resource types: GPU clusters need memory isolation and network segmentation, model endpoints need authentication and rate limiting, and cloud AI services need IAM least-privilege enforcement
- Orchestration layer security (MCP servers, function calling, agent frameworks) requires server verification, tool allowlisting, execution monitoring, and sandboxing to prevent tool output injection and lateral movement
- AI service accounts require least-privilege identity management with separate credentials per function, scoped API keys, credential rotation, and just-in-time access to break privilege escalation chains
Test Your Knowledge
Ready to test your understanding of AI infrastructure security? Head to the quiz to check your knowledge.
Up next
Infrastructure secured, the next layer addresses the human element. In Section 6, you’ll learn about Layer 4: Secure Your Users – including deepfake detection, endpoint security for AI-era threats, shadow AI governance, and how to protect users from AI-powered social engineering.