Microsoft AutoGen enables multi-agent conversations where agents collaborate, debate, and execute code autonomously. Cognisafe adds the security layer — monitoring every agent-to-agent message for prompt injection, detecting PII in conversation threads, and flagging jailbreaks before code gets executed.
Why this matters
AutoGen agents don't just generate text — they write and run code, make decisions, and hand off work to other agents. A jailbroken assistant agent can instruct the code executor to run arbitrary commands. Monitoring the LLM calls is your first line of defence.
Jailbreak leading to code execution
A malicious message jailbreaks the AssistantAgent, causing it to generate code that the UserProxyAgent executes — giving an attacker arbitrary code execution in your environment.
Agent-to-agent injection
One agent receives injected instructions via an external tool or data source and passes them to other agents as trusted conversational context, propagating the attack silently.
PII in conversation threads
Agents retrieve or generate content containing personal data. Because AutoGen conversations are multi-turn and logged, PII can persist across the entire session history.
System message extraction
A crafted input tricks the AssistantAgent into revealing its system message — exposing your agent's role configuration, tool definitions, and operational boundaries.
Quickstart
Cognisafe wraps the OpenAI client that AutoGen uses under the hood — no changes to your agent definitions, conversation patterns, or execution config.
pip install cognisafe
import cognisafe
import autogen
cognisafe.configure(
api_key="csk_...", # from cognisafe.uk/dashboard/settings
project_id="my-autogen",
)
cognisafe.patch_openai() # wraps AutoGen's OpenAI calls automatically
# Your agents are unchanged:
assistant = autogen.AssistantAgent(
name="assistant",
llm_config={"model": "gpt-4o"},
)
user_proxy = autogen.UserProxyAgent(
name="user_proxy",
code_execution_config={"work_dir": "coding"},
)
user_proxy.initiate_chat(assistant, message="Write a Python script to...")Use agent_name to track which agent in the conversation was the attack vector:
# Wrap each agent's LLM calls with its own tag:
def make_llm_config(agent_name: str) -> dict:
cognisafe.configure(
api_key="csk_...",
project_id="my-autogen",
agent_name=agent_name,
)
cognisafe.patch_openai()
return {"model": "gpt-4o"}
assistant = autogen.AssistantAgent(
name="assistant",
llm_config=make_llm_config("assistant"),
)
planner = autogen.AssistantAgent(
name="planner",
llm_config=make_llm_config("planner"),
)Every flagged LLM call is attributed to the agent that made it — assistant, planner, critic — with the full message and scorer rationale.
Safety scorers run asynchronously on every message. Alerts fire before a malicious code suggestion reaches the executor.
Trace PII across the full multi-turn conversation, not just individual messages — critical for AutoGen's extended sessions.
Jailbreak (LLM01), PII (LLM02), content safety (LLM05), and system prompt extraction (LLM07) scored on every agent call.
Free tier included. No credit card. Start monitoring in 5 minutes.