wip
This commit is contained in:
93
multi-agent-supervisor/docs/AGENT_ENHANCEMENT_SUMMARY.md
Normal file
93
multi-agent-supervisor/docs/AGENT_ENHANCEMENT_SUMMARY.md
Normal file
@@ -0,0 +1,93 @@
|
||||
# Enhanced Agent Results Communication
|
||||
|
||||
## Problem Identified
|
||||
The agents were only sending "Successfully transferred control back to supervisor" messages without providing meaningful analysis results from their work.
|
||||
|
||||
## Root Cause
|
||||
The agent prompts were too brief and didn't explicitly instruct agents to:
|
||||
1. Summarize their findings after executing commands
|
||||
2. Provide structured analysis before transferring back to supervisor
|
||||
3. Include specific recommendations and insights
|
||||
|
||||
## Solution Implemented
|
||||
|
||||
### 1. Enhanced Agent Prompts
|
||||
Updated all agent prompts to include:
|
||||
|
||||
- **Explicit task definitions** with required commands
|
||||
- **Structured analysis requirements** with specific sections
|
||||
- **Clear instructions** to provide comprehensive summaries
|
||||
- **Always provide analysis summary before completing task**
|
||||
|
||||
### 2. Specific Improvements by Agent
|
||||
|
||||
#### System Agents
|
||||
- **system_info_worker**: Now analyzes CPU, memory, disk, load, and top processes with structured summary
|
||||
- **service_inventory_worker**: Provides service categorization, failed services analysis, security-relevant services
|
||||
|
||||
#### Service Agents
|
||||
- **nginx_analyzer**: Comprehensive config validation, log analysis, specific 502/503/504 error troubleshooting
|
||||
- **mariadb_analyzer**: Database status, configuration assessment, log analysis, performance indicators
|
||||
- **phpfpm_analyzer**: Process analysis, memory limits, timeout configuration, socket connectivity
|
||||
|
||||
#### Network Agents
|
||||
- **network_diag**: Connectivity testing, DNS analysis, port scanning with adaptive commands
|
||||
- **cert_checker**: Certificate discovery, expiration monitoring, validation with 30-day alerts
|
||||
|
||||
#### Analysis Agents
|
||||
- **risk_scorer**: Structured risk assessment with CRITICAL/HIGH/MEDIUM/LOW levels and prioritized recommendations
|
||||
- **remediation_worker**: Safe remediation plans with dry-run commands and confirmation protocols
|
||||
- **harmonizer_worker**: Security hardening assessment with best-practice implementation
|
||||
|
||||
### 3. Enhanced Supervisor Coordination
|
||||
Updated supervisor prompt to:
|
||||
- Better utilize agent expertise
|
||||
- Synthesize findings from multiple agents
|
||||
- Provide strategic insights beyond delegation
|
||||
|
||||
## Expected Behavior Change
|
||||
|
||||
### Before:
|
||||
```
|
||||
STEP 2: SYSTEM_INFO_WORKER
|
||||
Successfully transferred control back to supervisor
|
||||
|
||||
STEP 3: SERVICE_INVENTORY_WORKER
|
||||
Successfully transferred control back to supervisor
|
||||
|
||||
STEP 4: SUPERVISOR
|
||||
Generic response not based on actual findings
|
||||
```
|
||||
|
||||
### After:
|
||||
```
|
||||
STEP 2: SYSTEM_INFO_WORKER
|
||||
📊 ANALYSIS SUMMARY:
|
||||
- CPU: Intel i7, load average 0.8, 1.2, 1.5
|
||||
- Memory: 8GB total, 6.2GB used (78% utilization)
|
||||
- Disk: Root partition 85% full - WARNING LEVEL
|
||||
- Top processes: nginx (2.1GB), mysql (1.8GB)
|
||||
- Recommendation: Monitor disk space, investigate memory usage
|
||||
|
||||
STEP 3: SERVICE_INVENTORY_WORKER
|
||||
📋 SERVICE ANALYSIS:
|
||||
- 47 services running (normal range)
|
||||
- Critical services: ✅ nginx, ✅ mysql, ❌ php-fpm (failed)
|
||||
- Failed services: php8.1-fpm.service
|
||||
- Security services: ✅ ssh, ✅ ufw
|
||||
- Recommendation: Investigate php-fpm failure for potential 502 errors
|
||||
|
||||
STEP 4: SUPERVISOR
|
||||
Based on system analysis showing high memory usage and service inventory
|
||||
revealing php-fpm failure, this explains your 502 errors...
|
||||
```
|
||||
|
||||
## Files Modified
|
||||
- `agents/system_agents.py` - Enhanced system monitoring agents
|
||||
- `agents/service_agents.py` - Enhanced service-specific agents
|
||||
- `agents/network_agents.py` - Enhanced network and security agents
|
||||
- `agents/analysis_agents.py` - Enhanced analysis and remediation agents
|
||||
- `config.py` - Enhanced supervisor prompt and coordination strategy
|
||||
|
||||
## Result
|
||||
Agents now provide meaningful, structured analysis that the supervisor can synthesize into comprehensive, actionable responses instead of generic outputs.
|
129
multi-agent-supervisor/docs/DYNAMIC_INSTRUCTIONS.md
Normal file
129
multi-agent-supervisor/docs/DYNAMIC_INSTRUCTIONS.md
Normal file
@@ -0,0 +1,129 @@
|
||||
# Dynamic Instructions for Agent Transfers - TODO
|
||||
|
||||
## Current Behavior
|
||||
Currently, when the supervisor transfers control to an agent:
|
||||
- ❌ No specific instructions are passed
|
||||
- ❌ Agent only sees the original user query
|
||||
- ❌ Agent uses its static, pre-defined prompt
|
||||
|
||||
## Proposed Enhancement: Dynamic Instructions
|
||||
|
||||
### Why It Matters
|
||||
The supervisor often has context about WHY it's transferring to a specific agent. For example:
|
||||
- "Transfer to network_diag because user mentioned DNS issues - focus on DNS diagnostics"
|
||||
- "Transfer to cert_checker because certificates might be expiring - check all certs urgently"
|
||||
|
||||
### Implementation Approach
|
||||
|
||||
#### 1. Modify Transfer Tools
|
||||
```python
|
||||
def transfer_to_network_diag(instructions: str = "") -> str:
|
||||
"""Transfer control to network diagnostics agent.
|
||||
|
||||
Args:
|
||||
instructions: Specific guidance for the agent
|
||||
"""
|
||||
return f"Successfully transferred to network_diag. Instructions: {instructions}"
|
||||
```
|
||||
|
||||
#### 2. Update State to Include Instructions
|
||||
```python
|
||||
class State(BaseModel):
|
||||
messages: List[AnyMessage]
|
||||
next_agent: str = "supervisor"
|
||||
supervisor_instructions: Optional[str] = None # NEW FIELD
|
||||
```
|
||||
|
||||
#### 3. Modify Agent Creation to Check for Instructions
|
||||
```python
|
||||
def create_network_worker():
|
||||
return create_react_agent(
|
||||
model="openai:gpt-4o-mini",
|
||||
tools=[get_shell_tool()],
|
||||
prompt="""
|
||||
{base_prompt}
|
||||
|
||||
SUPERVISOR INSTRUCTIONS (if any): {supervisor_instructions}
|
||||
|
||||
Always prioritize supervisor instructions when provided.
|
||||
""",
|
||||
name="network_diag"
|
||||
)
|
||||
```
|
||||
|
||||
#### 4. Update Router Logic
|
||||
```python
|
||||
def route_agent(state):
|
||||
# Extract supervisor instructions from last ToolMessage
|
||||
last_message = state["messages"][-1]
|
||||
if isinstance(last_message, ToolMessage) and "Instructions:" in last_message.content:
|
||||
# Parse and store instructions
|
||||
instructions = extract_instructions(last_message.content)
|
||||
state["supervisor_instructions"] = instructions
|
||||
|
||||
return state["next_agent"]
|
||||
```
|
||||
|
||||
### Example Flow
|
||||
|
||||
1. **User Query**: "My website is slow"
|
||||
|
||||
2. **Supervisor Analysis**:
|
||||
```
|
||||
"Website slowness could be DNS or certificate related.
|
||||
Let me transfer to network_diag with specific focus."
|
||||
```
|
||||
|
||||
3. **Supervisor Transfer**:
|
||||
```python
|
||||
transfer_to_network_diag(
|
||||
instructions="Focus on DNS resolution times and latency to common websites.
|
||||
Check if DNS servers are responding slowly."
|
||||
)
|
||||
```
|
||||
|
||||
4. **Network Agent Receives**:
|
||||
- Original query: "My website is slow"
|
||||
- Supervisor instructions: "Focus on DNS resolution times..."
|
||||
- Can now prioritize DNS diagnostics over general network checks
|
||||
|
||||
### Benefits
|
||||
|
||||
1. **More Targeted Diagnostics**: Agents focus on what matters
|
||||
2. **Better Context Sharing**: Supervisor's analysis isn't lost
|
||||
3. **Efficient Execution**: Avoid running unnecessary commands
|
||||
4. **Improved Results**: More relevant output for user's specific issue
|
||||
|
||||
### Alternative: Context in Messages
|
||||
|
||||
Instead of modifying tools, append supervisor analysis to the message history:
|
||||
|
||||
```python
|
||||
# Before transfer, supervisor adds a system message
|
||||
state["messages"].append(
|
||||
SystemMessage(content=f"[SUPERVISOR GUIDANCE]: Focus on {specific_issue}")
|
||||
)
|
||||
```
|
||||
|
||||
### Decision Points
|
||||
|
||||
1. **Tool Parameters vs State**: Where to store instructions?
|
||||
2. **Prompt Injection vs Message History**: How to pass instructions?
|
||||
3. **Optional vs Required**: Should all transfers include instructions?
|
||||
4. **Persistence**: Should instructions carry through multiple agent hops?
|
||||
|
||||
### Next Steps
|
||||
|
||||
1. [ ] Decide on implementation approach
|
||||
2. [ ] Modify transfer tool signatures
|
||||
3. [ ] Update state model
|
||||
4. [ ] Enhance agent prompts to use instructions
|
||||
5. [ ] Test with various scenarios
|
||||
6. [ ] Document the new pattern
|
||||
|
||||
### Example Test Cases
|
||||
|
||||
- "Check network" → No specific instructions needed
|
||||
- "Website is slow" → "Focus on DNS and latency"
|
||||
- "Certificate expiring?" → "Check all certs, prioritize those expiring soon"
|
||||
- "Port 443 issues" → "Focus on HTTPS connectivity and certificate validation"
|
90
multi-agent-supervisor/docs/README-modular.md
Normal file
90
multi-agent-supervisor/docs/README-modular.md
Normal file
@@ -0,0 +1,90 @@
|
||||
# Multi-Agent Sysadmin Assistant
|
||||
|
||||
A modular multi-agent system for system administration tasks using LangChain and LangGraph.
|
||||
|
||||
## Architecture
|
||||
|
||||
The system is organized into several modules for better maintainability:
|
||||
|
||||
### 📁 Project Structure
|
||||
|
||||
```
|
||||
multi-agent-supervisor/
|
||||
├── main-multi-agent.py # Main entry point
|
||||
├── config.py # Configuration and settings
|
||||
├── supervisor.py # Supervisor orchestration
|
||||
├── utils.py # Utility functions
|
||||
├── requirements.txt # Dependencies
|
||||
├── custom_tools/ # Custom tool implementations
|
||||
│ ├── __init__.py
|
||||
│ ├── log_tail_tool.py # Log reading tool
|
||||
│ └── shell_tool_wrapper.py # Shell tool wrapper
|
||||
└── agents/ # Agent definitions
|
||||
├── __init__.py
|
||||
├── system_agents.py # System monitoring agents
|
||||
├── service_agents.py # Service-specific agents
|
||||
├── network_agents.py # Network and security agents
|
||||
└── analysis_agents.py # Analysis and remediation agents
|
||||
```
|
||||
|
||||
## Agents
|
||||
|
||||
### System Agents
|
||||
- **System Info Worker**: Gathers CPU, RAM, and disk usage
|
||||
- **Service Inventory Worker**: Lists running services
|
||||
|
||||
### Service Agents
|
||||
- **MariaDB Analyzer**: Checks MariaDB configuration and logs
|
||||
- **Nginx Analyzer**: Validates Nginx configuration and logs
|
||||
- **PHP-FPM Analyzer**: Monitors PHP-FPM status and performance
|
||||
|
||||
### Network Agents
|
||||
- **Network Diagnostics**: Uses ping, traceroute, and dig
|
||||
- **Certificate Checker**: Monitors TLS certificate expiration
|
||||
|
||||
### Analysis Agents
|
||||
- **Risk Scorer**: Aggregates findings and assigns severity levels
|
||||
- **Remediation Worker**: Proposes safe fixes for issues
|
||||
- **Harmonizer Worker**: Applies system hardening best practices
|
||||
|
||||
## Benefits of Modular Architecture
|
||||
|
||||
1. **Separation of Concerns**: Each module has a single responsibility
|
||||
2. **Reusability**: Tools and agents can be easily reused across projects
|
||||
3. **Maintainability**: Easy to update individual components
|
||||
4. **Testability**: Each module can be tested independently
|
||||
5. **Scalability**: Easy to add new agents or tools
|
||||
6. **Code Organization**: Clear structure makes navigation easier
|
||||
|
||||
## Usage
|
||||
|
||||
```python
|
||||
from supervisor import create_sysadmin_supervisor
|
||||
|
||||
# Create supervisor with all agents
|
||||
supervisor = create_sysadmin_supervisor()
|
||||
|
||||
# Run analysis
|
||||
query = {
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Check if my web server is running properly"
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
result = supervisor.invoke(query)
|
||||
```
|
||||
|
||||
## Adding New Agents
|
||||
|
||||
1. Create agent function in appropriate module under `agents/`
|
||||
2. Import and add to supervisor in `supervisor.py`
|
||||
3. Update supervisor prompt in `config.py`
|
||||
|
||||
## Adding New Tools
|
||||
|
||||
1. Create tool class in `custom_tools/`
|
||||
2. Export from `custom_tools/__init__.py`
|
||||
3. Import and use in agent definitions
|
182
multi-agent-supervisor/docs/UNDERSTANDING_TRANSFERS.md
Normal file
182
multi-agent-supervisor/docs/UNDERSTANDING_TRANSFERS.md
Normal file
@@ -0,0 +1,182 @@
|
||||
# Understanding Multi-Agent Transfers
|
||||
|
||||
## What "Successfully transferred..." means
|
||||
|
||||
When you see messages like:
|
||||
- `Successfully transferred to system_info_worker`
|
||||
- `Successfully transferred back to supervisor`
|
||||
|
||||
These are **tool execution results** from the LangGraph supervisor pattern. Here's what's happening:
|
||||
|
||||
## 🔄 The Transfer Flow
|
||||
|
||||
1. **Supervisor receives user query**: "Nginx returns 502 Bad Gateway on my server. What can I do?"
|
||||
|
||||
2. **Supervisor analyzes and delegates**: Based on the `SUPERVISOR_PROMPT` in `config.py`, it decides to start with `system_info_worker`
|
||||
|
||||
3. **Transfer tool execution**: Supervisor calls `transfer_to_system_info_worker` tool
|
||||
- **Result**: "Successfully transferred to system_info_worker"
|
||||
- **Meaning**: Control is now handed to the system_info_worker agent
|
||||
|
||||
4. **Agent executes**: The `system_info_worker` gets:
|
||||
- Full conversation context (including the original user query)
|
||||
- Its own specialized prompt from `agents/system_agents.py`
|
||||
- Access to its tools (shell commands for system info)
|
||||
|
||||
5. **Agent completes and returns**: Agent calls `transfer_back_to_supervisor`
|
||||
- **Result**: "Successfully transferred back to supervisor"
|
||||
- **Meaning**: Agent finished its task and returned control
|
||||
- **Important**: Agent's results are now part of the conversation history
|
||||
|
||||
6. **Supervisor decides next step**: Based on **accumulated results**, supervisor either:
|
||||
- Delegates to another agent (e.g., `service_inventory_worker`)
|
||||
- Provides final response to user
|
||||
- **Key**: Supervisor can see ALL previous agent results when making decisions
|
||||
|
||||
## 🧠 How Prompts Work
|
||||
|
||||
### Supervisor Prompt (config.py)
|
||||
```python
|
||||
SUPERVISOR_PROMPT = """
|
||||
You are the supervisor of a team of specialised sysadmin agents.
|
||||
Decide which agent to delegate to based on the user's query **or** on results already collected.
|
||||
Available agents:
|
||||
- system_info_worker: gather system metrics
|
||||
- service_inventory_worker: list running services
|
||||
- mariadb_analyzer: analyse MariaDB
|
||||
...
|
||||
Always start with `system_info_worker` and `service_inventory_worker` before drilling into a specific service.
|
||||
"""
|
||||
```
|
||||
|
||||
### Agent Prompts (agents/*.py)
|
||||
Each agent has its own specialized prompt, for example:
|
||||
|
||||
```python
|
||||
# system_info_worker prompt
|
||||
"""
|
||||
You are a Linux sysadmin. Use shell commands like `lscpu`, `free -h`, and `df -h` to gather CPU, RAM, and disk usage.
|
||||
Return a concise plain‑text summary. Only run safe, read‑only commands.
|
||||
"""
|
||||
```
|
||||
|
||||
## 🎯 What Each Agent Receives
|
||||
|
||||
When an agent is activated via transfer:
|
||||
- **Full conversation history**: All previous messages between user, supervisor, and other agents
|
||||
- **Specialized prompt**: Guides how the agent should interpret and act on the conversation
|
||||
- **Tools**: Shell access, specific analyzers, etc.
|
||||
- **Context**: Results from previous agents in the conversation
|
||||
|
||||
## 🔄 How Agent Results Flow Back to Supervisor
|
||||
|
||||
**This is the key mechanism that makes the multi-agent system intelligent:**
|
||||
|
||||
1. **Agent produces results**: Each agent generates an `AIMessage` with its findings/analysis
|
||||
2. **Results become part of conversation**: The agent's response is added to the shared message history
|
||||
3. **Supervisor sees everything**: When control returns to supervisor, it has access to:
|
||||
- Original user query
|
||||
- All previous agent responses
|
||||
- Tool execution results
|
||||
- Complete conversation context
|
||||
|
||||
4. **Supervisor strategy updates**: Based on accumulated knowledge, supervisor can:
|
||||
- Decide which agent to call next
|
||||
- Skip unnecessary agents if enough info is gathered
|
||||
- Synthesize results from multiple agents
|
||||
- Provide final comprehensive response
|
||||
|
||||
### Example Flow:
|
||||
```
|
||||
User: "Nginx 502 error, help!"
|
||||
├── Supervisor → system_info_worker
|
||||
│ └── Returns: "502 usually means upstream server issues, check logs..."
|
||||
├── Supervisor (now knows about upstream issues) → service_inventory_worker
|
||||
│ └── Returns: "Check PHP-FPM status, verify upstream config..."
|
||||
└── Supervisor (has both perspectives) → Final synthesis
|
||||
└── "Based on system analysis and service inventory, here's comprehensive solution..."
|
||||
```
|
||||
|
||||
## 📤 What Workers Pass Back to Supervisor
|
||||
|
||||
**Key Insight**: Workers don't explicitly "return" data. Instead, all their work becomes part of the shared conversation history that the supervisor can access.
|
||||
|
||||
### What Gets Added to the Message History
|
||||
|
||||
When a worker (like `network_diag`) executes:
|
||||
|
||||
1. **AIMessages** - Agent's reasoning and analysis
|
||||
```
|
||||
"I'll start by checking external connectivity..."
|
||||
"DNS resolution appears to be working correctly..."
|
||||
"Network Analysis Summary: All systems operational..."
|
||||
```
|
||||
|
||||
2. **ToolMessages** - Raw command outputs
|
||||
```
|
||||
"PING 8.8.8.8 (8.8.8.8): 56 data bytes\n64 bytes from 8.8.8.8..."
|
||||
"google.com. 300 IN A 142.250.80.46"
|
||||
"tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN"
|
||||
```
|
||||
|
||||
3. **Transfer Confirmation** - When worker completes
|
||||
```
|
||||
"Successfully transferred back to supervisor"
|
||||
```
|
||||
|
||||
### Complete Message Flow Example
|
||||
|
||||
```python
|
||||
# After network_diag completes, state["messages"] contains:
|
||||
[
|
||||
HumanMessage("My website is slow"), # Original query
|
||||
AIMessage("I'll check network connectivity..."), # Supervisor decision
|
||||
ToolMessage("Successfully transferred to network_diag"), # Transfer confirmation
|
||||
AIMessage("Starting network diagnostics..."), # Worker starts
|
||||
ToolMessage("PING 8.8.8.8: 64 bytes from 8.8.8.8..."), # Command result 1
|
||||
AIMessage("External connectivity is good, checking DNS"), # Worker analysis
|
||||
ToolMessage("google.com. 300 IN A 142.250.80.46"), # Command result 2
|
||||
AIMessage("DNS working. Checking local services..."), # Worker continues
|
||||
ToolMessage("tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN"), # Command result 3
|
||||
AIMessage("Network Summary: All good, issue elsewhere"), # Worker's final analysis
|
||||
ToolMessage("Successfully transferred back to supervisor") # Transfer back
|
||||
]
|
||||
```
|
||||
|
||||
### How Supervisor Uses This Information
|
||||
|
||||
The supervisor receives **ALL** these messages and can:
|
||||
|
||||
1. **Read command outputs** to understand technical details
|
||||
2. **See agent reasoning** to understand what was checked
|
||||
3. **Access final analysis** to make informed decisions
|
||||
4. **Decide next steps** based on accumulated evidence
|
||||
|
||||
### Why This Design Works
|
||||
|
||||
- **Full Transparency**: Supervisor sees everything the worker did
|
||||
- **Rich Context**: Both raw data and interpreted analysis available
|
||||
- **Cumulative Knowledge**: Each agent builds on previous work
|
||||
- **Intelligent Routing**: Supervisor can adapt strategy based on findings
|
||||
|
||||
### Example: Multi-Agent Collaboration
|
||||
|
||||
```
|
||||
User: "Website is slow"
|
||||
├── network_diag finds: "Network is fine"
|
||||
├── cert_checker finds: "Certificate expires tomorrow!"
|
||||
└── Supervisor synthesis: "Issue is expiring certificate, not network"
|
||||
```
|
||||
|
||||
The supervisor can correlate findings across multiple workers because it sees all their work in the message history.
|
||||
|
||||
## 📋 Key Takeaways
|
||||
|
||||
- **"Successfully transferred"** = Control handoff confirmation, not data transfer
|
||||
- **Each agent** gets the full conversation context INCLUDING previous agent results
|
||||
- **Agent prompts** determine how they process that context
|
||||
- **Supervisor** orchestrates the workflow based on its prompt strategy
|
||||
- **The conversation** builds up context as each agent contributes their expertise
|
||||
- **Results accumulate**: Each agent can see and build upon previous agents' work
|
||||
- **Supervisor learns**: Strategy updates based on what agents discover
|
||||
- **Dynamic workflow**: Supervisor can skip agents or change direction based on results
|
Reference in New Issue
Block a user