Integration Guide5 min setup

Ship AutoGen agents to production safely

Microsoft AutoGen enables multi-agent conversations where agents collaborate, debate, and execute code autonomously. Cognisafe adds the security layer — monitoring every agent-to-agent message for prompt injection, detecting PII in conversation threads, and flagging jailbreaks before code gets executed.

Get your API key free →Jump to quickstart

Why this matters

Agents that execute code need security at the LLM layer

AutoGen agents don't just generate text — they write and run code, make decisions, and hand off work to other agents. A jailbroken assistant agent can instruct the code executor to run arbitrary commands. Monitoring the LLM calls is your first line of defence.

LLM01

Jailbreak leading to code execution

A malicious message jailbreaks the AssistantAgent, causing it to generate code that the UserProxyAgent executes — giving an attacker arbitrary code execution in your environment.

LLM01

Agent-to-agent injection

One agent receives injected instructions via an external tool or data source and passes them to other agents as trusted conversational context, propagating the attack silently.

LLM02

PII in conversation threads

Agents retrieve or generate content containing personal data. Because AutoGen conversations are multi-turn and logged, PII can persist across the entire session history.

LLM07

System message extraction

A crafted input tricks the AssistantAgent into revealing its system message — exposing your agent's role configuration, tool definitions, and operational boundaries.

Quickstart

Up and running in 5 minutes

Cognisafe wraps the OpenAI client that AutoGen uses under the hood — no changes to your agent definitions, conversation patterns, or execution config.

Install the SDK

pip install cognisafe

Add three lines before your agents start conversing

import cognisafe
import autogen

cognisafe.configure(
    api_key="csk_...",          # from cognisafe.uk/dashboard/settings
    project_id="my-autogen",
)
cognisafe.patch_openai()        # wraps AutoGen's OpenAI calls automatically

# Your agents are unchanged:
assistant = autogen.AssistantAgent(
    name="assistant",
    llm_config={"model": "gpt-4o"},
)
user_proxy = autogen.UserProxyAgent(
    name="user_proxy",
    code_execution_config={"work_dir": "coding"},
)
user_proxy.initiate_chat(assistant, message="Write a Python script to...")

Tag each agent for per-agent attribution

Use agent_name to track which agent in the conversation was the attack vector:

# Wrap each agent's LLM calls with its own tag:
def make_llm_config(agent_name: str) -> dict:
    cognisafe.configure(
        api_key="csk_...",
        project_id="my-autogen",
        agent_name=agent_name,
    )
    cognisafe.patch_openai()
    return {"model": "gpt-4o"}

assistant = autogen.AssistantAgent(
    name="assistant",
    llm_config=make_llm_config("assistant"),
)
planner = autogen.AssistantAgent(
    name="planner",
    llm_config=make_llm_config("planner"),
)

What you get in the dashboard

Per-agent threat feed

Every flagged LLM call is attributed to the agent that made it — assistant, planner, critic — with the full message and scorer rationale.

Jailbreak before execution

Safety scorers run asynchronously on every message. Alerts fire before a malicious code suggestion reaches the executor.

Conversation-level PII tracking

Trace PII across the full multi-turn conversation, not just individual messages — critical for AutoGen's extended sessions.

OWASP coverage

Jailbreak (LLM01), PII (LLM02), content safety (LLM05), and system prompt extraction (LLM07) scored on every agent call.

Secure your AutoGen system today

Free tier included. No credit card. Start monitoring in 5 minutes.

Start for free →Try live demo