agent-pard0x/multi-agent-supervisor/README.md

# Multi-Agent Supervisor System for Sysadmin Tasks

This directory contains a sophisticated multi-agent system with a supervisor pattern for comprehensive system administration and troubleshooting.

## Sources

https://github.com/langchain-ai/langgraph-supervisor-py
https://langchain-ai.github.io/langgraph/concepts/multi_agent/#supervisor

## Overview

The multi-agent supervisor system uses multiple specialized agents coordinated by a supervisor to handle complex sysadmin tasks:

1. **Supervisor Agent**: Orchestrates and delegates tasks to specialized workers
2. **Specialized Workers**: Each agent is an expert in a specific domain
3. **Parallel Processing**: Multiple agents can work simultaneously
4. **Intelligent Routing**: Tasks are routed to the most appropriate specialist

## Architecture

```
User Input → Supervisor → Specialized Agents → Aggregated Response
                ↓
    ┌─────────────────────────────────────────────────┐
    │  system_info │ nginx │ mariadb │ network │ ...  │
    └─────────────────────────────────────────────────┘
```

## Specialized Agents

### Core System Agents
- **`system_info_worker`**: CPU, RAM, disk usage monitoring
- **`service_inventory_worker`**: Lists running services

### Service-Specific Agents
- **`mariadb_analyzer`**: MariaDB configuration and log analysis
- **`nginx_analyzer`**: Nginx configuration validation and log analysis
- **`phpfpm_analyzer`**: PHP-FPM performance and error analysis

### Network & Security Agents
- **`network_diag`**: Network connectivity and DNS diagnostics
- **`cert_checker`**: TLS certificate validation and expiry alerts

### Analysis & Action Agents
- **`risk_scorer`**: Aggregates findings and assigns severity levels
- **`remediation_worker`**: Proposes safe fixes for detected issues
- **`harmonizer_worker`**: Applies security hardening best practices

## Features

### Core Capabilities
- **Local System Access**: Execute shell commands on the local machine
- **Remote Server Access**: Execute commands on remote servers via SSH
- **Persistent SSH Connections**: Efficient remote operations with connection reuse
- **Cross-Platform Support**: Works on Linux, macOS, BSD, and Windows systems

### Advanced Capabilities
- **Intelligent Delegation**: Supervisor routes tasks to appropriate specialists
- **Parallel Execution**: Multiple agents can work simultaneously
- **Severity Assessment**: Risk scoring with Critical/High/Medium/Low levels
- **Safe Remediation**: Proposes fixes with confirmation requests
- **Security Hardening**: Automated best-practice application

### Execution Modes
- **Invoke Mode**: Complete analysis with final result
- **Stream Mode**: Real-time step-by-step execution visibility

## Files

- `main-multi-agent.py`: Complete multi-agent supervisor implementation
- `agents/`: Directory containing specialized agent implementations
- `custom_tools/`: Custom tools used by the agents
- `supervisor.py`: Supervisor agent coordination logic
- `utils.py`: Utility functions and configurations

## Usage

```bash
cd multi-agent-supervisor
python main-multi-agent.py
```

The script includes both execution modes:

### 1. Invoke Mode (Complete Analysis)
```python
result = supervisor.invoke(query)
print(result["messages"][-1]["content"])
```

### 2. Stream Mode (Step-by-Step)
```python
for chunk in supervisor.stream(query):
    # Real-time agent execution monitoring
    print(f"🤖 ACTIVE AGENT: {current_agent}")
    print(f"🔧 TOOL CALLS: {len(tool_calls)} tool(s)")
```

## Example Workflow

For the query: *"Nginx returns 502 Bad Gateway on my server. What can I do?"*

1. **Supervisor** analyzes the request
2. **system_info_worker** checks system resources (local or remote)
3. **service_inventory_worker** lists running services
4. **nginx_analyzer** validates Nginx configuration and checks logs
5. **phpfpm_analyzer** checks PHP-FPM status (common 502 cause)
6. **risk_scorer** assesses the severity
7. **remediation_worker** proposes specific fixes

## Example Queries

The multi-agent system can handle both local and remote system administration:

### Local System Administration
```
"Check local system performance and identify bottlenecks"
"Analyze recent system errors in local logs"
"What services are running on this machine?"
```

### Remote Server Management
```
"Connect to my remote server and check disk usage"
"Compare performance between local and remote systems"
"Check if nginx is running on the remote server"
"Analyze logs on my remote server for error patterns"
```

### Multi-System Analysis
```
"Perform comprehensive health check across all systems"
"Compare configurations between local and remote servers"
"Identify performance differences between environments"
```

## Pros and Cons

### ✅ Pros
- **Domain Expertise**: Each agent specializes in specific areas
- **Parallel Processing**: Multiple agents work simultaneously
- **Comprehensive Analysis**: Systematic approach to complex problems
- **Risk Assessment**: Built-in severity scoring
- **Intelligent Routing**: Tasks go to the right specialist
- **Scalable**: Easy to add new specialized agents

### ❌ Cons
- **Complexity**: More sophisticated setup and debugging
- **Resource Intensive**: Higher computational overhead
- **Coordination Overhead**: Supervisor management complexity
- **Potential Over-engineering**: May be overkill for simple tasks

## When to Use

Choose the multi-agent supervisor when:
- You need comprehensive system analysis
- Multiple services/components are involved
- You want parallel processing capabilities
- Risk assessment and severity scoring are important
- You're dealing with complex, multi-faceted problems
- You need specialized domain expertise

## Agent Interaction Flow

```mermaid
graph TD
    A[User Query] --> B[Supervisor]
    B --> C[system_info_worker]
    B --> D[service_inventory_worker]
    B --> E[Service Specialists]
    E --> F[nginx_analyzer]
    E --> G[mariadb_analyzer]
    E --> H[phpfpm_analyzer]
    C --> I[risk_scorer]
    D --> I
    F --> I
    G --> I
    H --> I
    I --> J[remediation_worker]
    J --> K[Final Response]
```

## Customization

### Adding New Agents
```python
new_agent = create_react_agent(
    model="openai:gpt-4o-mini",
    tools=[shell_tool, custom_tools],
    prompt="Your specialized agent prompt...",
    name="new_specialist"
)

# Add to supervisor
supervisor = create_supervisor(
    agents=[...existing_agents, new_agent],
    model=model,
    prompt=updated_supervisor_prompt
)
```

### Custom Tools
```python
class CustomTool(BaseTool):
    name = "custom_tool"
    description = "Tool description"

    def _run(self, **kwargs):
        # Tool implementation
        return result
```

## Requirements

```bash
pip install langchain-openai langgraph langgraph-supervisor langchain-community
export OPENAI_API_KEY="your-api-key"
```

## Performance Considerations

- **Token Usage**: Higher due to multiple agent interactions
- **Execution Time**: May be longer due to coordination overhead
- **Memory**: Higher memory usage with multiple concurrent agents
- **Rate Limits**: Monitor API rate limits with parallel requests