226 lines
7.2 KiB
Markdown
226 lines
7.2 KiB
Markdown
# Multi-Agent Supervisor System for Sysadmin Tasks
|
|
|
|
This directory contains a sophisticated multi-agent system with a supervisor pattern for comprehensive system administration and troubleshooting.
|
|
|
|
## Sources
|
|
|
|
https://github.com/langchain-ai/langgraph-supervisor-py
|
|
https://langchain-ai.github.io/langgraph/concepts/multi_agent/#supervisor
|
|
|
|
## Overview
|
|
|
|
The multi-agent supervisor system uses multiple specialized agents coordinated by a supervisor to handle complex sysadmin tasks:
|
|
|
|
1. **Supervisor Agent**: Orchestrates and delegates tasks to specialized workers
|
|
2. **Specialized Workers**: Each agent is an expert in a specific domain
|
|
3. **Parallel Processing**: Multiple agents can work simultaneously
|
|
4. **Intelligent Routing**: Tasks are routed to the most appropriate specialist
|
|
|
|
## Architecture
|
|
|
|
```
|
|
User Input → Supervisor → Specialized Agents → Aggregated Response
|
|
↓
|
|
┌─────────────────────────────────────────────────┐
|
|
│ system_info │ nginx │ mariadb │ network │ ... │
|
|
└─────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Specialized Agents
|
|
|
|
### Core System Agents
|
|
- **`system_info_worker`**: CPU, RAM, disk usage monitoring
|
|
- **`service_inventory_worker`**: Lists running services
|
|
|
|
### Service-Specific Agents
|
|
- **`mariadb_analyzer`**: MariaDB configuration and log analysis
|
|
- **`nginx_analyzer`**: Nginx configuration validation and log analysis
|
|
- **`phpfpm_analyzer`**: PHP-FPM performance and error analysis
|
|
|
|
### Network & Security Agents
|
|
- **`network_diag`**: Network connectivity and DNS diagnostics
|
|
- **`cert_checker`**: TLS certificate validation and expiry alerts
|
|
|
|
### Analysis & Action Agents
|
|
- **`risk_scorer`**: Aggregates findings and assigns severity levels
|
|
- **`remediation_worker`**: Proposes safe fixes for detected issues
|
|
- **`harmonizer_worker`**: Applies security hardening best practices
|
|
|
|
## Features
|
|
|
|
### Core Capabilities
|
|
- **Local System Access**: Execute shell commands on the local machine
|
|
- **Remote Server Access**: Execute commands on remote servers via SSH
|
|
- **Persistent SSH Connections**: Efficient remote operations with connection reuse
|
|
- **Cross-Platform Support**: Works on Linux, macOS, BSD, and Windows systems
|
|
|
|
### Advanced Capabilities
|
|
- **Intelligent Delegation**: Supervisor routes tasks to appropriate specialists
|
|
- **Parallel Execution**: Multiple agents can work simultaneously
|
|
- **Severity Assessment**: Risk scoring with Critical/High/Medium/Low levels
|
|
- **Safe Remediation**: Proposes fixes with confirmation requests
|
|
- **Security Hardening**: Automated best-practice application
|
|
|
|
### Execution Modes
|
|
- **Invoke Mode**: Complete analysis with final result
|
|
- **Stream Mode**: Real-time step-by-step execution visibility
|
|
|
|
## Files
|
|
|
|
- `main-multi-agent.py`: Complete multi-agent supervisor implementation
|
|
- `agents/`: Directory containing specialized agent implementations
|
|
- `custom_tools/`: Custom tools used by the agents
|
|
- `supervisor.py`: Supervisor agent coordination logic
|
|
- `utils.py`: Utility functions and configurations
|
|
|
|
## Usage
|
|
|
|
```bash
|
|
cd multi-agent-supervisor
|
|
python main-multi-agent.py
|
|
```
|
|
|
|
The script includes both execution modes:
|
|
|
|
### 1. Invoke Mode (Complete Analysis)
|
|
```python
|
|
result = supervisor.invoke(query)
|
|
print(result["messages"][-1]["content"])
|
|
```
|
|
|
|
### 2. Stream Mode (Step-by-Step)
|
|
```python
|
|
for chunk in supervisor.stream(query):
|
|
# Real-time agent execution monitoring
|
|
print(f"🤖 ACTIVE AGENT: {current_agent}")
|
|
print(f"🔧 TOOL CALLS: {len(tool_calls)} tool(s)")
|
|
```
|
|
|
|
## Example Workflow
|
|
|
|
For the query: *"Nginx returns 502 Bad Gateway on my server. What can I do?"*
|
|
|
|
1. **Supervisor** analyzes the request
|
|
2. **system_info_worker** checks system resources (local or remote)
|
|
3. **service_inventory_worker** lists running services
|
|
4. **nginx_analyzer** validates Nginx configuration and checks logs
|
|
5. **phpfpm_analyzer** checks PHP-FPM status (common 502 cause)
|
|
6. **risk_scorer** assesses the severity
|
|
7. **remediation_worker** proposes specific fixes
|
|
|
|
## Example Queries
|
|
|
|
The multi-agent system can handle both local and remote system administration:
|
|
|
|
### Local System Administration
|
|
```
|
|
"Check local system performance and identify bottlenecks"
|
|
"Analyze recent system errors in local logs"
|
|
"What services are running on this machine?"
|
|
```
|
|
|
|
### Remote Server Management
|
|
```
|
|
"Connect to my remote server and check disk usage"
|
|
"Compare performance between local and remote systems"
|
|
"Check if nginx is running on the remote server"
|
|
"Analyze logs on my remote server for error patterns"
|
|
```
|
|
|
|
### Multi-System Analysis
|
|
```
|
|
"Perform comprehensive health check across all systems"
|
|
"Compare configurations between local and remote servers"
|
|
"Identify performance differences between environments"
|
|
```
|
|
|
|
## Pros and Cons
|
|
|
|
### ✅ Pros
|
|
- **Domain Expertise**: Each agent specializes in specific areas
|
|
- **Parallel Processing**: Multiple agents work simultaneously
|
|
- **Comprehensive Analysis**: Systematic approach to complex problems
|
|
- **Risk Assessment**: Built-in severity scoring
|
|
- **Intelligent Routing**: Tasks go to the right specialist
|
|
- **Scalable**: Easy to add new specialized agents
|
|
|
|
### ❌ Cons
|
|
- **Complexity**: More sophisticated setup and debugging
|
|
- **Resource Intensive**: Higher computational overhead
|
|
- **Coordination Overhead**: Supervisor management complexity
|
|
- **Potential Over-engineering**: May be overkill for simple tasks
|
|
|
|
## When to Use
|
|
|
|
Choose the multi-agent supervisor when:
|
|
- You need comprehensive system analysis
|
|
- Multiple services/components are involved
|
|
- You want parallel processing capabilities
|
|
- Risk assessment and severity scoring are important
|
|
- You're dealing with complex, multi-faceted problems
|
|
- You need specialized domain expertise
|
|
|
|
## Agent Interaction Flow
|
|
|
|
```mermaid
|
|
graph TD
|
|
A[User Query] --> B[Supervisor]
|
|
B --> C[system_info_worker]
|
|
B --> D[service_inventory_worker]
|
|
B --> E[Service Specialists]
|
|
E --> F[nginx_analyzer]
|
|
E --> G[mariadb_analyzer]
|
|
E --> H[phpfpm_analyzer]
|
|
C --> I[risk_scorer]
|
|
D --> I
|
|
F --> I
|
|
G --> I
|
|
H --> I
|
|
I --> J[remediation_worker]
|
|
J --> K[Final Response]
|
|
```
|
|
|
|
## Customization
|
|
|
|
### Adding New Agents
|
|
```python
|
|
new_agent = create_react_agent(
|
|
model="openai:gpt-4o-mini",
|
|
tools=[shell_tool, custom_tools],
|
|
prompt="Your specialized agent prompt...",
|
|
name="new_specialist"
|
|
)
|
|
|
|
# Add to supervisor
|
|
supervisor = create_supervisor(
|
|
agents=[...existing_agents, new_agent],
|
|
model=model,
|
|
prompt=updated_supervisor_prompt
|
|
)
|
|
```
|
|
|
|
### Custom Tools
|
|
```python
|
|
class CustomTool(BaseTool):
|
|
name = "custom_tool"
|
|
description = "Tool description"
|
|
|
|
def _run(self, **kwargs):
|
|
# Tool implementation
|
|
return result
|
|
```
|
|
|
|
## Requirements
|
|
|
|
```bash
|
|
pip install langchain-openai langgraph langgraph-supervisor langchain-community
|
|
export OPENAI_API_KEY="your-api-key"
|
|
```
|
|
|
|
## Performance Considerations
|
|
|
|
- **Token Usage**: Higher due to multiple agent interactions
|
|
- **Execution Time**: May be longer due to coordination overhead
|
|
- **Memory**: Higher memory usage with multiple concurrent agents
|
|
- **Rate Limits**: Monitor API rate limits with parallel requests
|