product: maestro audience: test-developer, operator, ai-assistant authority: normative
Troubleshooting Maestro with MCP Diagnostic Tools
When a user reports unexpected Maestro behaviour — a runner that won't respond, a test that fails for no obvious reason, pip packages that won't install, or the UI showing a station as offline — always run the diagnostic tools below before guessing or asking the user to check logs manually.
These tools give you direct, machine-readable visibility into the running stack without requiring SSH access, Docker CLI access, or manual log inspection.
Diagnostic Decision Tree
Step 1 — Is the station reachable at all?
get_system_health
database.healthy: false→ PostgreSQL is down or unreachable. The station cannot store results and tests will not start.redis.healthy: false→ Redis is down. SignalR and background jobs will fail.- Both healthy → infrastructure is fine; move to Step 2.
Step 2 — What versions are running?
get_system_version
runners: [](empty) → no runners have registered. This means the dotnet-runner and/or python-runner containers failed to start or cannot reach the API. Proceed immediately to Step 3.- Runner listed as
healthy: false→ runner is registered but not responding to health checks. Proceed to Step 3 for that runner. - Versions look unexpected → the stack may not have been updated. Run
get_update_statusandtrigger_system_updateif needed.
Step 3 — What did the container log before failing?
get_service_logs service="python-runner" tail=50
get_service_logs service="dotnet-runner" tail=50
get_service_logs service="api" tail=50
get_service_logs service="orchestra" tail=50
Read the tail logs for each affected service. Common findings:
| Log pattern | Likely cause |
|---|---|
pipPackages declared … AllowOnlinePip is false |
pip dependencies will not install; AllowOnlinePip station config key is missing or false |
pip install … error / No matching distribution |
transitive pip dependency failure; check for incompatible vendored wheels |
Address already in use |
port conflict; another process is using the runner port |
Connection refused to postgres/redis |
infrastructure not yet healthy when runner started |
OOM / process killed |
container ran out of memory |
ModuleNotFoundError |
Python package not installed; dependency installer failed silently |
If
get_service_logsreturnsunavailable: true, the/station-logsvolume is not mounted. This is a deployment configuration issue — see the deployment guide for the required docker-compose volume entries.
Step 4 — Did any container crash or restart?
get_system_events minutes=60
Look for die, oom, or rapid start→die cycles on any container. A
container that keeps restarting will appear in get_system_version as
intermittently healthy/unhealthy.
Pair with get_service_logs to read what the process logged just before the
crash.
If
get_system_eventsreturnsunavailable: true, the Docker socket/var/run/docker.sockis not mounted in the API container. This is a deployment configuration issue.
Step 5 — Is a specific test execution failing?
get_execution_logs executionId="<id>"
or for structured output with step names:
get_execution_logs executionId="<id>"
get_step_results executionId="<id>" page=1 pageSize=100
- Look for the first
FAILstep — that is where execution stopped or deviated. - Log lines beginning with
pip dependencies … could not be installed→ pip issue; cross-reference withget_service_logs python-runner. - Log lines containing
TypeError,AttributeError,ImportError→ Python code error in the test package itself. Connection refused/timeoutin step logs → hardware or instrument not reachable from the runner container.
Use search_execution_logs for keyword search across many executions:
search_execution_logs query="pip error"
search_execution_logs query="ImportError"
search_execution_logs query="timeout"
Step 6 — Is station configuration correct?
get_merged_config stationId="<station-id>"
Verify that required config keys are present and have the right values. Common keys to check:
| Key | Expected value | Effect if wrong |
|---|---|---|
AllowOnlinePip |
true (dev) / absent (prod) |
pip falls back to PyPI only when true |
AccordionIpAddress |
valid IP of the Accordion hub | hardware steps time out |
COM_PORT |
correct serial port | serial instrument steps fail |
Symptom Quick Reference
| User says… | Start with… |
|---|---|
| "runner is unavailable" / "no runners registered" | get_system_version → get_service_logs → get_system_events |
| "test fails immediately on step 1" | get_execution_logs → get_service_logs python-runner |
| "pip package won't install" | get_service_logs python-runner → get_merged_config (check AllowOnlinePip) |
| "station shows offline in dashboard" | get_system_health → get_system_events |
| "test was passing yesterday, failing today" | get_system_events (crash/restart?) → get_system_version (update?) → get_execution_logs |
| "UI is not updating / stuck" | get_system_health (Redis?) → get_service_logs api |
| "update seems stuck" | get_update_status → get_system_events |
| "wrong package version running" | list_packages → get_system_version |
Filing a Bug Report
Once you have diagnosed the problem using the steps above, file it directly from the AI session — no separate issue tracker needed:
maestro_bug_report
title = "<concise one-line summary>"
reportedBy = "<your name or station ID>"
description = "<paste the relevant tool output here>"
severity = "high" | "medium" | "low" | "critical"
Collect this context before calling maestro_bug_report so the description is complete:
get_station_info— station ID and versionsget_system_health— infrastructure stateget_system_events minutes=120— recent container eventsget_service_logs service="api" tail=100get_service_logs service="python-runner" tail=100(if Python-related)get_service_logs service="dotnet-runner" tail=100(if .NET-related)get_execution_logs executionId="<failing run>"(if a specific test failed)
See tools-feedback.md for full parameter reference and severity guidance.