2025-06-29 15:15:27 +02:00

7.2 KiB

Multi-Agent Supervisor System for Sysadmin Tasks

This directory contains a sophisticated multi-agent system with a supervisor pattern for comprehensive system administration and troubleshooting.

Sources

https://github.com/langchain-ai/langgraph-supervisor-py https://langchain-ai.github.io/langgraph/concepts/multi_agent/#supervisor

Overview

The multi-agent supervisor system uses multiple specialized agents coordinated by a supervisor to handle complex sysadmin tasks:

  1. Supervisor Agent: Orchestrates and delegates tasks to specialized workers
  2. Specialized Workers: Each agent is an expert in a specific domain
  3. Parallel Processing: Multiple agents can work simultaneously
  4. Intelligent Routing: Tasks are routed to the most appropriate specialist

Architecture

User Input → Supervisor → Specialized Agents → Aggregated Response
                ↓
    ┌─────────────────────────────────────────────────┐
    │  system_info │ nginx │ mariadb │ network │ ...  │
    └─────────────────────────────────────────────────┘

Specialized Agents

Core System Agents

  • system_info_worker: CPU, RAM, disk usage monitoring
  • service_inventory_worker: Lists running services

Service-Specific Agents

  • mariadb_analyzer: MariaDB configuration and log analysis
  • nginx_analyzer: Nginx configuration validation and log analysis
  • phpfpm_analyzer: PHP-FPM performance and error analysis

Network & Security Agents

  • network_diag: Network connectivity and DNS diagnostics
  • cert_checker: TLS certificate validation and expiry alerts

Analysis & Action Agents

  • risk_scorer: Aggregates findings and assigns severity levels
  • remediation_worker: Proposes safe fixes for detected issues
  • harmonizer_worker: Applies security hardening best practices

Features

Core Capabilities

  • Local System Access: Execute shell commands on the local machine
  • Remote Server Access: Execute commands on remote servers via SSH
  • Persistent SSH Connections: Efficient remote operations with connection reuse
  • Cross-Platform Support: Works on Linux, macOS, BSD, and Windows systems

Advanced Capabilities

  • Intelligent Delegation: Supervisor routes tasks to appropriate specialists
  • Parallel Execution: Multiple agents can work simultaneously
  • Severity Assessment: Risk scoring with Critical/High/Medium/Low levels
  • Safe Remediation: Proposes fixes with confirmation requests
  • Security Hardening: Automated best-practice application

Execution Modes

  • Invoke Mode: Complete analysis with final result
  • Stream Mode: Real-time step-by-step execution visibility

Files

  • main-multi-agent.py: Complete multi-agent supervisor implementation
  • agents/: Directory containing specialized agent implementations
  • custom_tools/: Custom tools used by the agents
  • supervisor.py: Supervisor agent coordination logic
  • utils.py: Utility functions and configurations

Usage

cd multi-agent-supervisor
python main-multi-agent.py

The script includes both execution modes:

1. Invoke Mode (Complete Analysis)

result = supervisor.invoke(query)
print(result["messages"][-1]["content"])

2. Stream Mode (Step-by-Step)

for chunk in supervisor.stream(query):
    # Real-time agent execution monitoring
    print(f"🤖 ACTIVE AGENT: {current_agent}")
    print(f"🔧 TOOL CALLS: {len(tool_calls)} tool(s)")

Example Workflow

For the query: "Nginx returns 502 Bad Gateway on my server. What can I do?"

  1. Supervisor analyzes the request
  2. system_info_worker checks system resources (local or remote)
  3. service_inventory_worker lists running services
  4. nginx_analyzer validates Nginx configuration and checks logs
  5. phpfpm_analyzer checks PHP-FPM status (common 502 cause)
  6. risk_scorer assesses the severity
  7. remediation_worker proposes specific fixes

Example Queries

The multi-agent system can handle both local and remote system administration:

Local System Administration

"Check local system performance and identify bottlenecks"
"Analyze recent system errors in local logs"
"What services are running on this machine?"

Remote Server Management

"Connect to my remote server and check disk usage"
"Compare performance between local and remote systems"  
"Check if nginx is running on the remote server"
"Analyze logs on my remote server for error patterns"

Multi-System Analysis

"Perform comprehensive health check across all systems"
"Compare configurations between local and remote servers"
"Identify performance differences between environments"

Pros and Cons

Pros

  • Domain Expertise: Each agent specializes in specific areas
  • Parallel Processing: Multiple agents work simultaneously
  • Comprehensive Analysis: Systematic approach to complex problems
  • Risk Assessment: Built-in severity scoring
  • Intelligent Routing: Tasks go to the right specialist
  • Scalable: Easy to add new specialized agents

Cons

  • Complexity: More sophisticated setup and debugging
  • Resource Intensive: Higher computational overhead
  • Coordination Overhead: Supervisor management complexity
  • Potential Over-engineering: May be overkill for simple tasks

When to Use

Choose the multi-agent supervisor when:

  • You need comprehensive system analysis
  • Multiple services/components are involved
  • You want parallel processing capabilities
  • Risk assessment and severity scoring are important
  • You're dealing with complex, multi-faceted problems
  • You need specialized domain expertise

Agent Interaction Flow

graph TD
    A[User Query] --> B[Supervisor]
    B --> C[system_info_worker]
    B --> D[service_inventory_worker]
    B --> E[Service Specialists]
    E --> F[nginx_analyzer]
    E --> G[mariadb_analyzer]
    E --> H[phpfpm_analyzer]
    C --> I[risk_scorer]
    D --> I
    F --> I
    G --> I
    H --> I
    I --> J[remediation_worker]
    J --> K[Final Response]

Customization

Adding New Agents

new_agent = create_react_agent(
    model="openai:gpt-4o-mini",
    tools=[shell_tool, custom_tools],
    prompt="Your specialized agent prompt...",
    name="new_specialist"
)

# Add to supervisor
supervisor = create_supervisor(
    agents=[...existing_agents, new_agent],
    model=model,
    prompt=updated_supervisor_prompt
)

Custom Tools

class CustomTool(BaseTool):
    name = "custom_tool"
    description = "Tool description"
    
    def _run(self, **kwargs):
        # Tool implementation
        return result

Requirements

pip install langchain-openai langgraph langgraph-supervisor langchain-community
export OPENAI_API_KEY="your-api-key"

Performance Considerations

  • Token Usage: Higher due to multiple agent interactions
  • Execution Time: May be longer due to coordination overhead
  • Memory: Higher memory usage with multiple concurrent agents
  • Rate Limits: Monitor API rate limits with parallel requests