Compare commits

...

10 Commits

Author SHA1 Message Date
Gaetan Hurel
b1deb41296 remove poem, add shell tool 2025-06-30 17:21:43 +02:00
Gaetan Hurel
98aa3301d1 use 4.1 2025-06-30 17:13:52 +02:00
Gaetan Hurel
788c2f012c remove poem tool 2025-06-30 17:13:44 +02:00
Gaetan Hurel
b133b3fecd add use cases trace 2025-06-30 17:13:38 +02:00
Gaetan Hurel
d77b1b1b4c remove unused files 2025-06-30 17:12:34 +02:00
Gaetan Hurel
337c5024b7 add available models 2025-06-30 17:12:22 +02:00
Gaetan Hurel
e601856812 remove poem tool 2025-06-30 16:55:55 +02:00
Gaetan Hurel
7e340a6649 use 4.1 2025-06-30 16:55:41 +02:00
Gaetan Hurel
228003bedc save interesting use cases 2025-06-30 16:55:27 +02:00
Gaetan Hurel
7ea51b11e3 remove unused md 2025-06-30 07:58:34 +02:00
19 changed files with 118 additions and 346 deletions

3
AVAILABLE_MODELS.md Normal file
View File

@@ -0,0 +1,3 @@
o3-mini
gpt-4o-mini
openai:gpt-4.1

View File

View File

@@ -1,86 +0,0 @@
# SSH Banner Error Fix Implementation
## Problem
The multi-agent supervisor system was creating multiple SSH connections simultaneously, causing "Error reading SSH protocol banner" errors. This happened because each agent that needed SSH access was creating its own connection to the remote server.
## Root Cause
- Multiple agents attempting to establish SSH connections in parallel
- SSH server or network infrastructure rejecting rapid connection attempts
- No connection pooling or sharing mechanism
## Solution Implemented
### 1. SSH Connection Manager (`ssh_connection_manager.py`)
- **Singleton pattern** to manage shared SSH connections
- **Thread-safe connection pooling** to prevent multiple connections to the same host
- **Global execution lock** to serialize SSH operations across all agents
- **Automatic connection cleanup** on exit
Key features:
- One connection per unique host/user/port combination
- 200ms delay between operations to prevent rapid connections
- Proper cleanup of connections on exit
### 2. Updated SSH Tool (`ssh_tool.py`)
- Added `use_shared_connection` parameter (defaults to `True`)
- Integration with the connection manager
- Thread-safe execution through the connection manager's lock
- Backward compatibility for non-shared connections
### 3. Updated Configuration (`__init__.py`)
- Pre-configured SSH tool now uses shared connections
- Import and export of the SSH connection manager
- Clear documentation of the shared connection feature
### 4. Enhanced Supervisor (`main-multi-agent.py`)
- Updated prompt to emphasize **sequential execution** over parallel
- Added proper SSH connection cleanup on exit
- Improved error handling and resource management
### 5. Sequential Executor (`sequential_executor.py`)
- Additional layer of protection against parallel execution
- 300ms delay between agent executions
- Comprehensive logging for debugging
## Key Benefits
1. **Eliminates SSH Banner Errors**: Only one connection per server
2. **Improved Reliability**: Prevents connection flooding
3. **Better Resource Management**: Shared connections reduce overhead
4. **Thread Safety**: Proper locking prevents race conditions
5. **Graceful Cleanup**: Connections are properly closed on exit
## Configuration
The system is now configured to:
- Use shared SSH connections by default
- Execute agent operations sequentially when SSH is involved
- Automatically clean up connections on exit
- Provide clear error messages if issues occur
## Testing
A test script (`test_ssh_sharing.py`) has been created to verify:
- Connection sharing is working correctly
- Sequential execution is enforced
- Cleanup works properly
## Usage
The system now works exactly as before from the user's perspective, but with improved reliability:
```bash
cd /Users/ghsioux/tmp/langgraph-pard0x/multi-agent-supervisor
python main-multi-agent.py
```
Users can query the system normally, and the SSH operations will be handled reliably in the background.
## Technical Details
- **Connection Key**: `username@host:port` uniquely identifies connections
- **Execution Lock**: Global thread lock ensures sequential SSH operations
- **Delay Strategy**: Small delays prevent rapid connection attempts
- **Cleanup Strategy**: Automatic cleanup on normal exit and SIGINT
This implementation resolves the SSH banner errors while maintaining the full functionality of the multi-agent system.

View File

@@ -3,16 +3,16 @@
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent
from langchain_community.tools.shell.tool import ShellTool
from custom_tools import print_poem, configured_remote_server
from custom_tools import configured_remote_server
def create_logs_analyzer_worker():
"""Create a logs analyzer agent that investigates system and application logs."""
tools = [configured_remote_server, print_poem]
tools = [configured_remote_server]
return create_react_agent(
model=ChatOpenAI(model="gpt-4o-mini", temperature=0),
model=ChatOpenAI(model="gpt-4.1", temperature=0),
tools=tools,
prompt="""You are an expert Logs Analysis Agent specialized in investigating and diagnosing issues through log files across different operating systems.

View File

@@ -3,16 +3,16 @@
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent
from langchain_community.tools.shell.tool import ShellTool
from custom_tools import print_poem, configured_remote_server
from custom_tools import configured_remote_server
def create_os_detector_worker():
"""Create an OS detector agent that identifies system information and environment."""
tools = [configured_remote_server, print_poem]
tools = [configured_remote_server]
return create_react_agent(
model=ChatOpenAI(model="gpt-4o-mini", temperature=0),
model=ChatOpenAI(model="gpt-4.1", temperature=0),
tools=tools,
prompt="""You are an expert OS Detection Agent specialized in identifying and analyzing operating systems across different platforms.

View File

@@ -3,16 +3,16 @@
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent
from langchain_community.tools.shell.tool import ShellTool
from custom_tools import print_poem, configured_remote_server
from custom_tools import configured_remote_server
def create_performance_analyzer_worker():
"""Create a performance analyzer agent that monitors and diagnoses performance issues."""
tools = [configured_remote_server, print_poem]
tools = [configured_remote_server]
return create_react_agent(
model=ChatOpenAI(model="gpt-4o-mini", temperature=0),
model=ChatOpenAI(model="gpt-4.1", temperature=0),
tools=tools,
prompt="""You are an expert Performance Analysis Agent specialized in monitoring and optimizing system performance across different operating systems.

View File

@@ -13,7 +13,7 @@ def create_service_discovery_worker():
tools = [configured_remote_server]
return create_react_agent(
model=ChatOpenAI(model="gpt-4o-mini", temperature=0),
model=ChatOpenAI(model="gpt-4.1", temperature=0),
tools=tools,
prompt="""You are an expert Service Discovery Agent specialized in finding ALL services running on a system, regardless of their deployment method.

View File

@@ -1,6 +1,5 @@
"""Custom tools for the multi-agent sysadmin system."""
from .poem_tool import print_poem
from .ssh_tool import SSHTool
from .ssh_connection_manager import ssh_manager
from langchain_community.tools.shell.tool import ShellTool
@@ -17,4 +16,4 @@ configured_remote_server = SSHTool(
)
__all__ = ["print_poem", "SSHTool", "ShellTool", "configured_remote_server", "ssh_manager"]
__all__ = ["SSHTool", "ShellTool", "configured_remote_server", "ssh_manager"]

View File

@@ -16,7 +16,8 @@ from agents import (
create_performance_analyzer_worker,
create_service_discovery_worker
)
from custom_tools import print_poem, configured_remote_server
from custom_tools import configured_remote_server
from langchain_community.tools.shell.tool import ShellTool
# Suppress the shell tool warning since worker agents use it intentionally for sysadmin tasks
warnings.filterwarnings("ignore", message="The shell tool has no safeguards by default. Use at your own risk.")
@@ -32,7 +33,6 @@ def print_welcome():
print(" • 📊 Logs Analyzer - Log investigation and error diagnosis (local & remote)")
print(" • ⚡ Performance Analyzer - Resource monitoring and optimization (local & remote)")
print(" • 🔍 Service Discovery - Comprehensive service enumeration across all platforms")
print(" • 🎭 Morale Booster - Motivational poems for tough debugging sessions!")
print("\n🌐 Remote Server Access: My agents can execute commands on both:")
print(" • Local machine via shell commands")
print(" • Remote server via SSH (g@157.90.211.119:8081)")
@@ -50,7 +50,6 @@ def print_examples():
print(" - 'Compare disk usage between local and remote server'")
print(" - 'Check if services are running on both systems'")
print(" - 'My web server is down, help me troubleshoot'")
print(" - 'Write me a motivational poem about debugging'")
print("\n" + "-"*80)
@@ -58,7 +57,7 @@ def create_sysadmin_supervisor():
"""Create the main supervisor that coordinates between specialized agents."""
# Get the base model
model = ChatOpenAI(model="gpt-4o-mini", temperature=0)
model = ChatOpenAI(model="gpt-4.1", temperature=0)
# Create specialized workers
os_detector = create_os_detector_worker()
@@ -82,8 +81,7 @@ Your role:
1. **Task Analysis**: Understand the user's request and determine which agent(s) to engage
2. **Coordination**: Delegate tasks to appropriate agents based on their specialties
3. **Synthesis**: Combine insights from multiple agents into coherent solutions
4. **Direct Action**: Handle simple tasks yourself without delegation
5. **Morale Boost**: Use the poem tool to encourage users during tough debugging sessions
4. **Direct Action**: Handle simple tasks yourself using shell commands when appropriate
IMPORTANT: To prevent SSH connection issues, delegate tasks SEQUENTIALLY, not in parallel.
Wait for one agent to complete their SSH tasks before starting the next one.
@@ -107,10 +105,10 @@ Communication style:
- Be professional yet approachable
- Provide clear explanations of your delegation decisions
- Synthesize agent findings into actionable recommendations
- Add a touch of humor when appropriate (especially with poems!)
- Use shell commands directly when it's more efficient than delegation
Remember: Your goal is to solve system problems efficiently by leveraging your team's specialized skills while maintaining a positive debugging experience!""",
tools=[print_poem] # Supervisor only has poem tool - no shell/SSH access
tools=[ShellTool()] # Supervisor can execute shell commands directly
)
return supervisor.compile()

View File

@@ -1,36 +0,0 @@
"""Sequential execution wrapper to prevent parallel SSH connections."""
import threading
import logging
from typing import Any, Dict, List, Callable
from langchain_core.messages import BaseMessage
logger = logging.getLogger(__name__)
class SequentialExecutor:
"""Ensures agents execute sequentially when using SSH to prevent connection flooding."""
def __init__(self):
self._execution_lock = threading.Lock()
self._logger = logging.getLogger(__name__)
def execute_agent_safely(self, agent_func: Callable, messages: List[BaseMessage], agent_name: str = "unknown") -> Dict[str, Any]:
"""Execute an agent function with thread safety to prevent parallel SSH operations."""
with self._execution_lock:
self._logger.info(f"Executing agent: {agent_name}")
try:
result = agent_func({"messages": messages})
# Add a small delay to prevent rapid successive SSH connections
import time
time.sleep(0.3) # 300ms delay between agent executions
return result
except Exception as e:
self._logger.error(f"Error executing agent {agent_name}: {e}")
raise
finally:
self._logger.info(f"Completed execution of agent: {agent_name}")
# Global sequential executor instance
sequential_executor = SequentialExecutor()

View File

@@ -1,109 +0,0 @@
#!/usr/bin/env python3
"""
Test script to verify SSH connection sharing works properly.
This script simulates multiple agents trying to use SSH simultaneously.
"""
import time
import threading
from custom_tools import configured_remote_server, ssh_manager
from custom_tools.ssh_tool import SSHTool
def test_ssh_connection_sharing():
"""Test that SSH connection sharing prevents multiple connections."""
print("🧪 Testing SSH Connection Sharing...")
print("=" * 50)
# Test 1: Verify shared connection is being used
print("\n1. Testing shared connection mechanism...")
# Create multiple SSH tool instances with shared connection
ssh_tool1 = SSHTool(
host="157.90.211.119",
port=8081,
username="g",
key_filename="/Users/ghsioux/.ssh/id_rsa_hetzner",
use_shared_connection=True
)
ssh_tool2 = SSHTool(
host="157.90.211.119",
port=8081,
username="g",
key_filename="/Users/ghsioux/.ssh/id_rsa_hetzner",
use_shared_connection=True
)
# Verify they share the same session
if ssh_tool1.session is ssh_tool2.session:
print("✅ SSH tools are sharing the same session instance")
else:
print("❌ SSH tools are NOT sharing the same session instance")
# Test 2: Test sequential execution
print("\n2. Testing sequential execution...")
def run_command(tool, command, name):
"""Run a command with timing info."""
start_time = time.time()
try:
result = tool._run(command)
end_time = time.time()
print(f" {name}: Completed in {end_time - start_time:.2f}s")
return result
except Exception as e:
end_time = time.time()
print(f" {name}: Failed in {end_time - start_time:.2f}s - {e}")
return f"Error: {e}"
# Test commands that should run sequentially
commands = [
("whoami", "Agent 1"),
("date", "Agent 2"),
("pwd", "Agent 3")
]
threads = []
results = {}
for cmd, agent_name in commands:
thread = threading.Thread(
target=lambda c=cmd, n=agent_name: results.update({n: run_command(configured_remote_server, c, n)})
)
threads.append(thread)
# Start all threads (they should execute sequentially due to our lock)
print(" Starting multiple SSH operations...")
start_time = time.time()
for thread in threads:
thread.start()
for thread in threads:
thread.join()
total_time = time.time() - start_time
print(f" Total execution time: {total_time:.2f}s")
# Test 3: Verify connection cleanup
print("\n3. Testing connection cleanup...")
print(" Current connections:", len(ssh_manager._connections))
# Close all connections
ssh_manager.close_all()
print(" Connections after cleanup:", len(ssh_manager._connections))
print("\n" + "=" * 50)
print("🎉 SSH Connection Sharing Test Complete!")
return results
if __name__ == "__main__":
try:
results = test_ssh_connection_sharing()
print("\nTest Results:")
for agent, result in results.items():
print(f" {agent}: {result[:50]}{'...' if len(result) > 50 else ''}")
except Exception as e:
print(f"Test failed: {e}")

View File

@@ -1,6 +1,5 @@
"""Custom tools package for the LangGraph demo agent."""
from .poem_tool import print_poem
from .ssh_tool import SSHTool
from langchain_community.tools.shell.tool import ShellTool
@@ -14,4 +13,4 @@ configured_remote_server = SSHTool(
ask_human_input=True # Safety confirmation
)
__all__ = ["print_poem", "SSHTool", "ShellTool", "configured_remote_server"]
__all__ = ["SSHTool", "ShellTool", "configured_remote_server"]

View File

@@ -1,81 +0,0 @@
import random
from langchain_core.tools import tool
@tool
def print_poem(poem_type: str = "random") -> str:
"""
Print a beautiful poem for the user.
Args:
poem_type: Type of poem to print. Options: "nature", "tech", "motivational", "random"
Returns:
A beautiful poem as a string
"""
poems = {
"nature": """
🌿 Nature's Symphony 🌿
In the whisper of the wind through ancient trees,
Where sunlight dances on the morning breeze,
The earth awakens with a gentle song,
A melody that's carried all along.
Rivers flow with stories untold,
Mountains stand majestic and bold,
In nature's embrace, we find our peace,
Where all our worries and troubles cease.
""",
"tech": """
💻 Digital Dreams 💻
In lines of code, our dreams take flight,
Binary stars illuminate the night,
Algorithms dance in silicon halls,
While innovation answers progress calls.
From circuits small to networks vast,
We build the future, learn from the past,
In every byte and every bit,
Human creativity and logic fit.
""",
"motivational": """
⭐ Rise and Shine ⭐
Every dawn brings a chance anew,
To chase the dreams that call to you,
Though mountains high may block your way,
Your spirit grows stronger every day.
The path is long, the journey tough,
But you, my friend, are strong enough,
With courage as your faithful guide,
Success will walk right by your side.
""",
"friendship": """
🤝 Bonds of Friendship 🤝
In laughter shared and tears that fall,
True friendship conquers over all,
Through seasons change and years that pass,
These precious bonds forever last.
A friend's warm smile, a helping hand,
Together strong, united we stand,
In friendship's light, we find our way,
Brightening each and every day.
"""
}
# If random or invalid type, pick a random poem
if poem_type == "random" or poem_type not in poems:
poem_type = random.choice(list(poems.keys()))
selected_poem = poems[poem_type]
return f"Here's a {poem_type} poem for you:\n{selected_poem}"

View File

@@ -5,7 +5,7 @@ from langchain.chat_models import init_chat_model
from langchain_community.tools.shell.tool import ShellTool
from langgraph.prebuilt import create_react_agent
from langchain_core.messages import HumanMessage
from custom_tools import print_poem, configured_remote_server
from custom_tools import configured_remote_server
# Suppress the shell tool warning since we're using it intentionally for sysadmin tasks
warnings.filterwarnings("ignore", message="The shell tool has no safeguards by default. Use at your own risk.")
@@ -17,11 +17,11 @@ def create_agent():
# Initialize the chat model (using OpenAI GPT-4)
# Make sure you have set your OPENAI_API_KEY environment variable
llm = init_chat_model("openai:gpt-4o-mini")
llm = init_chat_model("openai:gpt-4.1")
# Define the tools available to the agent
shell_tool = ShellTool()
tools = [shell_tool, configured_remote_server, print_poem]
tools = [shell_tool, configured_remote_server]
# Create a ReAct agent with system administration debugging focus
@@ -36,12 +36,10 @@ def create_agent():
3. **OS Detection**: Automatically detect the operating system and adapt commands accordingly
4. **Issue Diagnosis**: Analyze symptoms and systematically investigate root causes
5. **Problem Resolution**: Provide solutions and execute fixes when safe to do so
6. **Easter Egg**: Generate poems when users need a morale boost (use print_poem tool)
## AVAILABLE TOOLS
- **terminal**: Execute commands on the local machine
- **configured_remote_server**: Execute commands on the pre-configured remote server
- **print_poem**: Generate motivational poems for debugging sessions
## OPERATING SYSTEM AWARENESS
- **First interaction**: Always detect the OS using appropriate commands (uname, systeminfo, etc.)
@@ -103,9 +101,7 @@ def create_agent():
- Pipe long outputs through 'head' or 'tail' when appropriate
- Use 'timeout' command for potentially long-running diagnostics
- Always explain the output interpretation
- Suggest next steps based on findings
Remember: Your primary goal is to help solve system problems efficiently and safely. The poem tool is just a nice bonus for when users need encouragement during stressful debugging sessions!"""
- Suggest next steps based on findings"""
# Create the ReAct agent
@@ -156,7 +152,6 @@ def main():
print(" - Analyze system logs and error messages")
print(" - Network connectivity diagnostics")
print(" - Cross-platform support (Linux, macOS, BSD, Windows)")
print(" - 🎭 Easter egg: Generate motivational poems when you need a break!")
print("-" * 70)
# Create the agent
@@ -167,7 +162,6 @@ def main():
print("💡 Or: 'Check if my web server is running properly'")
print("💡 Or: 'Connect to my remote server and check disk space'")
print("💡 Or: 'Analyze recent system errors'")
print("💡 Need a break? Ask: 'Write me a motivational poem'")
except Exception as e:
print(f"❌ Error initializing agent: {e}")
@@ -197,8 +191,6 @@ def main():
print(" - 'Troubleshoot network connectivity problems'")
print(" - 'Check disk space and filesystem health'")
print(" - 'Review recent system errors in logs'")
print("\nEaster Egg:")
print(" - 'Write me a motivational poem' (for when debugging gets tough!)")
print("\nSafety Notes:")
print(" - Agent will ask permission before running potentially harmful commands")
print(" - All commands are explained before execution")

View File

@@ -0,0 +1,63 @@
# Debugging Session Summary
## 1. Request:
**Check all Incus containers in all projects on remote server**
**Answer:**
Provided a comprehensive list of all Incus containers across all projects, specifying which projects had containers and listing each container with OS, resources, and snapshot info.
---
## 2. Request:
**Check the logs of all containers from `gta` and `default` projects, and identify which ones must be taken care of**
**Answer:**
Summarized log analysis for all containers in the `gta` and `default` projects. Highlighted containers with critical issues (dns, matrix, radicale, vaultwarden, web), those with warnings (mail), and those with no issues.
---
## 3. Request:
**Show me some relevant error for dns, vaultwarden, web and radicale**
**Answer:**
Listed the most relevant recent errors for each requested container, including error messages and brief interpretations for:
- dns (zone transfer and permission issues)
- radicale (service and Apache2 failures)
- vaultwarden (missing binary and misconfigured service)
- web (nginx start/reload failures)
---
## 4. Request:
**But how old are these selected logs?**
**Answer:**
Clarified the recency of the selected log entries for each container, stating that most were from the last 2448 hours, except for radicale (which was later corrected).
---
## 5. Request:
**Run journalctl -b -p err on the radicale container and tell me how old are the error**
**Answer:**
Reported that the errors in the radicale container were from February 20th, several months old, and not recent.
---
## 6. Request:
**So you were wrong, right?**
**Answer:**
Acknowledged the mistake, confirmed the error about the recency of radicale logs, and thanked you for catching it.
---
## 7. Request:
**Generate a summary of my requests and your answers in a markdown file in the current dir**
**Answer:**
Youre reading it! This file summarizes all your requests and my responses for your records.
---
*If you need further details, troubleshooting steps, or a morale-boosting poem, just ask!*

View File

@@ -0,0 +1,30 @@
# Incus Troubleshooting Session Summary
## User Requests
1. **Check all Incus containers in all projects on remote server**
2. **Check the logs of all containers from `gta` and `default` projects, and identify which ones must be taken care of**
3. **Check Incus configuration to find where container filesystems are mounted, then look directly at logs from the root host filesystem**
## My Responses
1. **Container Discovery**
- Listed all projects and containers:
- `default`: dns, gitea, jitsi, mail, matrix, radicale, seedreco, vaultwarden, web
- `gta`: wireguard
- `test-project`, `vins`: no containers
2. **Attempted Container Log Retrieval (via Incus tooling)**
- Encountered limitations: direct log retrieval per project via `incus query` not supported as expected.
- Explained alternative approaches (using `incus exec` or fetching from files inside containers).
- Asked for your preferred log-checking method: journal inside the container, or files from the host.
3. **Logs Inspected Directly from Host Filesystem**
- Used information about Incus container mountpoints to inspect `/var/log/syslog` and `/var/log/messages` for each container:
- Most containers lack system logs at these paths.
- `jitsi`: Found many repeated TURN server TLS/TCP socket errors (recommend investigation)
- `mail`: ClamAV warning for outdated signatures (recommend update)
- Recommended focusing on `jitsi` (TURN server errors) and `mail` (ClamAV out-of-date) for attention.
---
If you need further details on specific containers, deeper log analysis, or recommendations on fixes, just ask!