How to Build Secure AI Applications: An SDLC Checklist.

This checklist walks through five phases of the software development lifecycle (from design and threat modelling through to runtime monitoring) with concrete checks at each stage for teams building with LLMs, RAG pipelines, MCP-connected tooling, and AI agents. Use it as a build guide, a pre-deployment audit, or a brief for your penetration testing engagement.

By Sherif Koussa

・

6 min read

Table of contents

Text Link

Get security insights straight
to your inbox

When I talk to SaaS teams about securing their AI features, the conversation almost always starts with the model. Can it be jailbroken?

Can it leak data?

Those are real questions. But they are not the whole picture.

The bigger risk is the application around the model, what it can access, retrieve, trigger, and expose. This is why modern application security testing for AI systems has to evaluate far more than the model itself.

Your AI features are not chat boxes. They process customer data, retrieve internal documents, call APIs, update records, and make decisions that affect real users.

So the real question isn't whether the model is well-behaved. It's: what can the model reach, what can it reveal, what can it trigger, and what happens when an attacker influences any part of that chain? That's the focus of a comprehensive AI penetration testing engagement.

I built this checklist to answer that. It organizes testing by when in the build process each check belongs, and ties every item back to the five layers of a real AI-powered product:

LLM API: the prompt in, the response out
RAG: retrieval from your internal data sources
MCP: tools the model can invoke
Agents: multi-step, tool-using, goal-seeking workflows
AI-assisted dev tooling: code your team ships with Copilot, Cursor, Claude Code, etc.

Whatever layer you're testing, you're really testing for the same three failures: untrusted input reaches a privileged context, output is trusted too early, and authorization is checked in the wrong place. Keep those in mind as you work through the phases below.

↓ Download PDF ↓ Download Markdown

Phase 1: Design & Threat Modelling

Before a line of code ships. The goal is to map trust boundaries; every arrow between layers crosses one.

Diagram the full stack end-to-end. Identify every point where data crosses a trust boundary (user → prompt, document → context, model → tool, agent → agent

Phase 1 Design & threat modelling

Classify what data the AI feature can reach (PII, PHI, financial records, source code, business logic) and label each as a potential disclosure path.

Define which actions the system can trigger, not just what it can say — record updates, deletions, external emails, API calls, code execution.

Establish the intended permission model: does tool/retrieval access map to the requesting user's permissions, or to a shared service credential?

For agentic features, document where human review is required for sensitive actions (deletes, external comms, permission changes, data exports).

Define logging requirements up front: can your team reconstruct what the AI read, decided, and called?

Phase 2, Development

While the feature is being built. Validate that secure handling is built in, not bolted on.

Phase 2 Development

LLM API layer

Input handling separates user-supplied content from system/developer instructions (so user input can't override the system prompt).

Model output is treated as untrusted before it's rendered, stored, or passed downstream — no raw HTML/Markdown injection, no unsafe links, no unsanitized output into Slack/Teams/PDF/email.

Output that flows into downstream functions (API calls, DB writes, command execution) is validated and parameterized — no injection from model text.

Token usage and call frequency are bounded to prevent cost-based denial of service.

RAG layer

Retrieval is scoped to the requesting user's permissions — a user from one account cannot retrieve another tenant's documents.

Retrieved content is treated as untrusted: instructions hidden inside documents, PDFs, tickets, or KB articles should not change model behavior.

Responses don't leak retrieval metadata that shouldn't be exposed (document names, author details, internal IDs, file paths, source references).

Index updates propagate deletions/restrictions — when a source doc is removed or restricted, the AI can no longer answer from it.

MCP / tools layer

Tool execution is tied to the user's permissions, not only the MCP server's credentials.

Tools validate file paths, command arguments, API parameters, and all user-controlled inputs (guard against path traversal and argument injection).

Dangerous tools (file write, shell execution, DB access, privileged API calls) have scopes, rate limits, and approval gates.

Tool descriptions and tool results can't manipulate the model into unsafe follow-up actions — treat tool output as untrusted too.

Dev tooling layer

AI-generated code goes through the same review gates as human-written code — no bypass because "it compiled and tests passed."

Reviewers specifically check authorization, tenant isolation, secrets handling, payment/billing logic, and admin functionality in generated code.

Secrets, API keys, tokens, and env values are not exposed in generated code, logs, public assets, or client-side bundles.

Phase 3, Pre-Deployment Testing

Adversarial testing before go-live. This is where a dedicated penetration testing engagement attempts to break the system before attackers do.

Phase 3 Pre-deployment testing

LLM API

Attempt prompt injection: override the system prompt, extract hidden instructions, bypass guardrails.

Test every output destination separately — UI, Slack/Teams, email, PDF export, DB, downstream API — for XSS, injection, and unsafe rendering.

RAG

Attempt cross-tenant retrieval and indirect/second-order injection via planted content in documents, tickets, or emails.

Test whether an attacker can rank their content as "most relevant" to poison answers.

MCP

Attempt prompt-injection-driven invocation of destructive or high-privilege tools.

Test tool argument validation with malicious paths, injected arguments, and out-of-scope parameters.

Agents

Test whether content the agent reads (ticket, doc, email, customer message) can silently change its goal.

Test second-order privilege escalation: low-privileged user plants content → higher-privileged agent acts on it.

Verify permission is re-checked on every tool call, not just at session start.

Test agent-to-agent handoff — can one agent delegate to another with more access or more powerful tools?

Confirm sensitive actions actually hit a human-review gate before executing.

Dev tooling

Probe AI-generated routes/APIs/admin functions for missing authentication.

Test tenant isolation on the backend (not just hidden in the frontend) and verify DB queries are scoped to the current user/tenant.

Hunt for authorization logic that looks right but fails on edge cases, role changes, or direct API access.

Phase 4, Deployment

Validate that controls assumed in design are actually live in production config.

Phase 4 Deployment

Authentication and authorization are enforced on all LLM-backed endpoints.

Guardrails, content filters, and output sanitizers validated in the production environment (config drift between staging and prod is common).

Rate limits and token/cost caps are active on live endpoints.

Audit logging for tool execution and agent actions is on and capturing enough to reconstruct an incident.

Human-in-the-loop approval gates for sensitive actions are enabled in production config, not just present in code.

Phase 5, Runtime & Monitoring

After launch. AI features and their data sources change continuously.

Phase 5 Runtime & monitoring

Monitor for abuse patterns: injection attempts, token-inflation, anomalous tool-call volume.

Alert on cost spikes (denial-of-wallet signal).

Detect retrieval drift — newly added/changed source documents that introduce poisoned or over-permissioned content.

Maintain an incident-response path specific to AI: reconstruct what the agent read, decided, called, and why.

Re-test after material changes to prompts, tools, MCP servers, or agent workflows — these change the attack surface even when "the app" didn't.

The Common Thread

If you take nothing else from this, test for these three things everywhere:

Did untrusted input reach a privileged context? A user prompt reaching the system layer, a planted document reaching the model context, a user-level request reaching admin-scoped tool credentials.
Was output trusted too early? Model output driving actions or rendering without validation; generated code executed without review; retrieved content used without authorization checks.
Was authorization checked in the wrong place? UI-layer controls that miss the retrieval layer; session-level checks that don't apply at tool invocation; user-level assumptions that ignore service-credential scope.

Underneath, these are the same vulnerability classes you already know: broken access control, unsafe output handling, parameter injection, over-trusted third-party components. Many of these issues can also be identified earlier through a secure code review process. The stack is more complex and more connected than what we're used to, but the risks aren't harder to understand. They're just harder to test and reproduce. That's the part I'd push you not to skip.

Test the full stack, not just the model.

Ready to get in touch? Get started by booking a consultation now.

Book Consultation

About the author

Sherif Koussa

CEO

Sherif Koussa is a cybersecurity expert and entrepreneur with a rich software building and breaking background. In 2006, he founded the OWASP Ottawa Chapter, contributed to WebGoat and OWASP Cheat Sheets, and helped launch SANS/GIAC exams. Today, as CEO of Software Secured, he helps hundreds of SaaS companies continuously ship secure code.

Get security insights straight to your inbox

Continue your reading with these value-packed posts

Penetration Test Reports & ROI

Attack Chains: The Hidden Weakness in Modern API & Web Application Security

How to Build Secure AI Applications: An SDLC Checklist.

Get security insights straight to your inbox

Phase 1: Design & Threat Modelling

Phase 2, Development

Phase 3, Pre-Deployment Testing

Phase 4, Deployment

Phase 5, Runtime & Monitoring

The Common Thread

Ready to get in touch? Get started by booking a consultation now.

About the author

Sherif Koussa

CEO

Get security insights straight to your inbox

Continue your reading with these value-packed posts

Penetration Testing Services Explained

5 Ways Penetration Testing Reduces Overall Security Costs

Insecure by Design: Default Configurations in Embedded Systems

Get security insights straight
to your inbox