Live demo — data resets daily at 03:00 UTC. Nothing you enter is saved. Server UI →

product: maestro audience: test-developer, operator, ai-assistant authority: normative

Troubleshooting Maestro with MCP Diagnostic Tools

When a user reports unexpected Maestro behaviour — a runner that won't respond, a test that fails for no obvious reason, pip packages that won't install, or the UI showing a station as offline — always run the diagnostic tools below before guessing or asking the user to check logs manually.

These tools give you direct, machine-readable visibility into the running stack without requiring SSH access, Docker CLI access, or manual log inspection.


Diagnostic Decision Tree

Step 1 — Is the station reachable at all?

get_system_health
  • database.healthy: false → PostgreSQL is down or unreachable. The station cannot store results and tests will not start.
  • redis.healthy: false → Redis is down. SignalR and background jobs will fail.
  • Both healthy → infrastructure is fine; move to Step 2.

Step 2 — What versions are running?

get_system_version
  • runners: [] (empty) → no runners have registered. This means the dotnet-runner and/or python-runner containers failed to start or cannot reach the API. Proceed immediately to Step 3.
  • Runner listed as healthy: false → runner is registered but not responding to health checks. Proceed to Step 3 for that runner.
  • Versions look unexpected → the stack may not have been updated. Run get_update_status and trigger_system_update if needed.

Step 3 — What did the container log before failing?

get_service_logs  service="python-runner"   tail=50
get_service_logs  service="dotnet-runner"   tail=50
get_service_logs  service="api"             tail=50
get_service_logs  service="orchestra"       tail=50

Read the tail logs for each affected service. Common findings:

Log pattern Likely cause
pipPackages declared … AllowOnlinePip is false pip dependencies will not install; AllowOnlinePip station config key is missing or false
pip install … error / No matching distribution transitive pip dependency failure; check for incompatible vendored wheels
Address already in use port conflict; another process is using the runner port
Connection refused to postgres/redis infrastructure not yet healthy when runner started
OOM / process killed container ran out of memory
ModuleNotFoundError Python package not installed; dependency installer failed silently

If get_service_logs returns unavailable: true, the /station-logs volume is not mounted. This is a deployment configuration issue — see the deployment guide for the required docker-compose volume entries.


Step 4 — Did any container crash or restart?

get_system_events  minutes=60

Look for die, oom, or rapid startdie cycles on any container. A container that keeps restarting will appear in get_system_version as intermittently healthy/unhealthy.

Pair with get_service_logs to read what the process logged just before the crash.

If get_system_events returns unavailable: true, the Docker socket /var/run/docker.sock is not mounted in the API container. This is a deployment configuration issue.


Step 5 — Is a specific test execution failing?

get_execution_logs  executionId="<id>"

or for structured output with step names:

get_execution_logs  executionId="<id>"
get_step_results    executionId="<id>"  page=1  pageSize=100
  • Look for the first FAIL step — that is where execution stopped or deviated.
  • Log lines beginning with pip dependencies … could not be installed → pip issue; cross-reference with get_service_logs python-runner.
  • Log lines containing TypeError, AttributeError, ImportError → Python code error in the test package itself.
  • Connection refused / timeout in step logs → hardware or instrument not reachable from the runner container.

Use search_execution_logs for keyword search across many executions:

search_execution_logs  query="pip error"
search_execution_logs  query="ImportError"
search_execution_logs  query="timeout"

Step 6 — Is station configuration correct?

get_merged_config  stationId="<station-id>"

Verify that required config keys are present and have the right values. Common keys to check:

Key Expected value Effect if wrong
AllowOnlinePip true (dev) / absent (prod) pip falls back to PyPI only when true
AccordionIpAddress valid IP of the Accordion hub hardware steps time out
COM_PORT correct serial port serial instrument steps fail

Symptom Quick Reference

User says… Start with…
"runner is unavailable" / "no runners registered" get_system_versionget_service_logsget_system_events
"test fails immediately on step 1" get_execution_logsget_service_logs python-runner
"pip package won't install" get_service_logs python-runnerget_merged_config (check AllowOnlinePip)
"station shows offline in dashboard" get_system_healthget_system_events
"test was passing yesterday, failing today" get_system_events (crash/restart?) → get_system_version (update?) → get_execution_logs
"UI is not updating / stuck" get_system_health (Redis?) → get_service_logs api
"update seems stuck" get_update_statusget_system_events
"wrong package version running" list_packagesget_system_version

Filing a Bug Report

Once you have diagnosed the problem using the steps above, file it directly from the AI session — no separate issue tracker needed:

maestro_bug_report
  title       = "<concise one-line summary>"
  reportedBy  = "<your name or station ID>"
  description = "<paste the relevant tool output here>"
  severity    = "high" | "medium" | "low" | "critical"

Collect this context before calling maestro_bug_report so the description is complete:

  1. get_station_info — station ID and versions
  2. get_system_health — infrastructure state
  3. get_system_events minutes=120 — recent container events
  4. get_service_logs service="api" tail=100
  5. get_service_logs service="python-runner" tail=100 (if Python-related)
  6. get_service_logs service="dotnet-runner" tail=100 (if .NET-related)
  7. get_execution_logs executionId="<failing run>" (if a specific test failed)

See tools-feedback.md for full parameter reference and severity guidance.

An unhandled error has occurred. Reload 🗙

Rejoining the server...

Rejoin failed... trying again in seconds.

Failed to rejoin.
Please retry or reload the page.

The session has been paused by the server.

Failed to resume the session.
Please retry or reload the page.