agent-pard0x/multi-agent-supervisor/docs/UNDERSTANDING_TRANSFERS.md
Gaetan Hurel d33cddef1e
wip
2025-06-26 18:02:43 +02:00

183 lines
7.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Understanding Multi-Agent Transfers
## What "Successfully transferred..." means
When you see messages like:
- `Successfully transferred to system_info_worker`
- `Successfully transferred back to supervisor`
These are **tool execution results** from the LangGraph supervisor pattern. Here's what's happening:
## 🔄 The Transfer Flow
1. **Supervisor receives user query**: "Nginx returns 502 Bad Gateway on my server. What can I do?"
2. **Supervisor analyzes and delegates**: Based on the `SUPERVISOR_PROMPT` in `config.py`, it decides to start with `system_info_worker`
3. **Transfer tool execution**: Supervisor calls `transfer_to_system_info_worker` tool
- **Result**: "Successfully transferred to system_info_worker"
- **Meaning**: Control is now handed to the system_info_worker agent
4. **Agent executes**: The `system_info_worker` gets:
- Full conversation context (including the original user query)
- Its own specialized prompt from `agents/system_agents.py`
- Access to its tools (shell commands for system info)
5. **Agent completes and returns**: Agent calls `transfer_back_to_supervisor`
- **Result**: "Successfully transferred back to supervisor"
- **Meaning**: Agent finished its task and returned control
- **Important**: Agent's results are now part of the conversation history
6. **Supervisor decides next step**: Based on **accumulated results**, supervisor either:
- Delegates to another agent (e.g., `service_inventory_worker`)
- Provides final response to user
- **Key**: Supervisor can see ALL previous agent results when making decisions
## 🧠 How Prompts Work
### Supervisor Prompt (config.py)
```python
SUPERVISOR_PROMPT = """
You are the supervisor of a team of specialised sysadmin agents.
Decide which agent to delegate to based on the user's query **or** on results already collected.
Available agents:
- system_info_worker: gather system metrics
- service_inventory_worker: list running services
- mariadb_analyzer: analyse MariaDB
...
Always start with `system_info_worker` and `service_inventory_worker` before drilling into a specific service.
"""
```
### Agent Prompts (agents/*.py)
Each agent has its own specialized prompt, for example:
```python
# system_info_worker prompt
"""
You are a Linux sysadmin. Use shell commands like `lscpu`, `free -h`, and `df -h` to gather CPU, RAM, and disk usage.
Return a concise plaintext summary. Only run safe, readonly commands.
"""
```
## 🎯 What Each Agent Receives
When an agent is activated via transfer:
- **Full conversation history**: All previous messages between user, supervisor, and other agents
- **Specialized prompt**: Guides how the agent should interpret and act on the conversation
- **Tools**: Shell access, specific analyzers, etc.
- **Context**: Results from previous agents in the conversation
## 🔄 How Agent Results Flow Back to Supervisor
**This is the key mechanism that makes the multi-agent system intelligent:**
1. **Agent produces results**: Each agent generates an `AIMessage` with its findings/analysis
2. **Results become part of conversation**: The agent's response is added to the shared message history
3. **Supervisor sees everything**: When control returns to supervisor, it has access to:
- Original user query
- All previous agent responses
- Tool execution results
- Complete conversation context
4. **Supervisor strategy updates**: Based on accumulated knowledge, supervisor can:
- Decide which agent to call next
- Skip unnecessary agents if enough info is gathered
- Synthesize results from multiple agents
- Provide final comprehensive response
### Example Flow:
```
User: "Nginx 502 error, help!"
├── Supervisor → system_info_worker
│ └── Returns: "502 usually means upstream server issues, check logs..."
├── Supervisor (now knows about upstream issues) → service_inventory_worker
│ └── Returns: "Check PHP-FPM status, verify upstream config..."
└── Supervisor (has both perspectives) → Final synthesis
└── "Based on system analysis and service inventory, here's comprehensive solution..."
```
## 📤 What Workers Pass Back to Supervisor
**Key Insight**: Workers don't explicitly "return" data. Instead, all their work becomes part of the shared conversation history that the supervisor can access.
### What Gets Added to the Message History
When a worker (like `network_diag`) executes:
1. **AIMessages** - Agent's reasoning and analysis
```
"I'll start by checking external connectivity..."
"DNS resolution appears to be working correctly..."
"Network Analysis Summary: All systems operational..."
```
2. **ToolMessages** - Raw command outputs
```
"PING 8.8.8.8 (8.8.8.8): 56 data bytes\n64 bytes from 8.8.8.8..."
"google.com. 300 IN A 142.250.80.46"
"tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN"
```
3. **Transfer Confirmation** - When worker completes
```
"Successfully transferred back to supervisor"
```
### Complete Message Flow Example
```python
# After network_diag completes, state["messages"] contains:
[
HumanMessage("My website is slow"), # Original query
AIMessage("I'll check network connectivity..."), # Supervisor decision
ToolMessage("Successfully transferred to network_diag"), # Transfer confirmation
AIMessage("Starting network diagnostics..."), # Worker starts
ToolMessage("PING 8.8.8.8: 64 bytes from 8.8.8.8..."), # Command result 1
AIMessage("External connectivity is good, checking DNS"), # Worker analysis
ToolMessage("google.com. 300 IN A 142.250.80.46"), # Command result 2
AIMessage("DNS working. Checking local services..."), # Worker continues
ToolMessage("tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN"), # Command result 3
AIMessage("Network Summary: All good, issue elsewhere"), # Worker's final analysis
ToolMessage("Successfully transferred back to supervisor") # Transfer back
]
```
### How Supervisor Uses This Information
The supervisor receives **ALL** these messages and can:
1. **Read command outputs** to understand technical details
2. **See agent reasoning** to understand what was checked
3. **Access final analysis** to make informed decisions
4. **Decide next steps** based on accumulated evidence
### Why This Design Works
- **Full Transparency**: Supervisor sees everything the worker did
- **Rich Context**: Both raw data and interpreted analysis available
- **Cumulative Knowledge**: Each agent builds on previous work
- **Intelligent Routing**: Supervisor can adapt strategy based on findings
### Example: Multi-Agent Collaboration
```
User: "Website is slow"
├── network_diag finds: "Network is fine"
├── cert_checker finds: "Certificate expires tomorrow!"
└── Supervisor synthesis: "Issue is expiring certificate, not network"
```
The supervisor can correlate findings across multiple workers because it sees all their work in the message history.
## 📋 Key Takeaways
- **"Successfully transferred"** = Control handoff confirmation, not data transfer
- **Each agent** gets the full conversation context INCLUDING previous agent results
- **Agent prompts** determine how they process that context
- **Supervisor** orchestrates the workflow based on its prompt strategy
- **The conversation** builds up context as each agent contributes their expertise
- **Results accumulate**: Each agent can see and build upon previous agents' work
- **Supervisor learns**: Strategy updates based on what agents discover
- **Dynamic workflow**: Supervisor can skip agents or change direction based on results