product: maestro audience: test-developer, ai-assistant authority: normative
Maestro for Engineers Who Already Know pytest or Robot Framework
You know how to write tests. You have coverage reports, CI pipelines, and fixture systems you actually understand. So when someone tells you to look at "yet another test framework", your instinct is right to be skeptical.
This document is not for beginners. It is for engineers who have shipped real test suites with pytest or Robot Framework and want a straight answer to: what problem does Maestro actually solve that I don't already have a solution for?
The short answer: pytest and Robot Framework are excellent tools for testing software. Maestro is built for testing hardware on a factory floor — and that turns out to be a fundamentally different problem.
The Core Problem: Software Testing vs. Manufacturing Testing
When you test software, you own the entire environment. You mock external dependencies, parametrize inputs, and the "device under test" is a process you spin up and kill.
Manufacturing test is different in five ways that matter:
| Concern | Software testing | Manufacturing testing |
|---|---|---|
| Device under test | A process you control | Physical hardware, one of thousands |
| Environment | Identical across CI agents | Different instruments, addresses, wiring per station |
| Identity | Doesn't matter | Serial number, batch, routing history must be traceable |
| Operators | Engineers | Factory workers who do not run commands |
| Results | Pass/fail for the next commit | Permanent records with numeric values for SPC/yield analysis |
pytest and Robot Framework solve the first column well. Maestro is designed for the second.
1. Station Configuration: The Problem You're Currently Solving Badly
If you have deployed pytest-based hardware tests across more than one station, you have almost certainly done one of these:
- Hard-coded instrument addresses and then maintained separate branches per station
- Used environment variables and added a provisioning script that is nobody's responsibility
- Written a
conftest.pythat reads from a config file that lives outside version control
Maestro's approach: A PostgreSQL-backed two-tier key-value store, injected automatically into every test step as cfg.* variables.
# This YAML is identical on every station
parameters:
address: "{{cfg.DMM_VISA}}"
Station ST-01 has DMM_VISA = TCPIP::192.168.1.50::INSTR. Station ST-02 has DMM_VISA = TCPIP::192.168.1.51::INSTR. The YAML never changes. The test code never changes. The per-station difference lives exactly where it belongs: in station configuration, managed through the UI, version-audited in the database.
Every test execution stores a JSONB snapshot of the merged config. Six months from now, "what settings were active on ST-02 when UNIT-4421 failed?" is a SQL query, not a forensic investigation.
2. Traceability Is Not an Afterthought
In pytest, traceability is something you bolt on: write serial numbers to a log file, parse the log file, insert into a spreadsheet. Or you write a plugin. Or you use a paid test management tool that ingests JUnit XML.
In Maestro, traceability is structural. Before a test starts, the system knows:
- Who started it (
exec.operator_id) - Which station it is running on (
exec.station_id,cfg.station_id) - Which device is being tested (
exec.serial_number) - Which version of the test package is running (Git commit hash, stored on every execution record)
These are not environment variables you need to remember to set. They arrive as first-class parameters in every step, alongside your cfg.* variables. Your test code just uses them:
def program_eeprom(serial_number: str, product_id: str) -> None:
# serial_number came from exec.serial_number — always present, always correct
The question "which version of the test was running when UNIT-4421 failed?" is answered by looking up TestExecution.TestPackageCommitHash in the database and checking it against the Git repository. No tooling required.
3. Measurements Are Data, Not Strings
This is the one that stings the most when you realise it.
In pytest, a measurement looks like this:
voltage = measure_rail("3V3")
assert 3.135 <= voltage <= 3.465, f"VCORE out of range: {voltage}"
That assert produces a pass or a fail and a string. The numeric value voltage does not exist anywhere after the test completes, unless you explicitly write it somewhere.
In Maestro:
measurement:
name: "RAIL_3V3"
value: "{{measured_voltage}}"
low_limit: 3.135
high_limit: 3.465
unit: "V"
The engine stores name, value, low_limit, high_limit, unit, verdict, timestamp, station_id, and serial_number as a row in a relational measurements table — not in a log, not in a blob.
What this enables, natively, in SQL:
-- Did RAIL_3V3 drift on Thursday?
SELECT DATE_TRUNC('hour', timestamp) AS hour,
AVG(value), MIN(value), MAX(value)
FROM measurements
WHERE measurement_name = 'RAIL_3V3'
AND timestamp BETWEEN '2026-04-10' AND '2026-04-11'
GROUP BY 1 ORDER BY 1;
-- 3-sigma limits for current production run
SELECT AVG(value) - 3 * STDDEV(value) AS lower_3sigma,
AVG(value) + 3 * STDDEV(value) AS upper_3sigma
FROM measurements
WHERE measurement_name = 'VCORE' AND timestamp > NOW() - INTERVAL '7 days';
No post-processing. No log parsing. No Excel. The data is there because the engine was designed to put it there.
4. Limits Do Not Live in Code
In your current setup, if a test limit changes — say the acceptable voltage range tightens from ±5% to ±3% — what does that change involve?
In pytest: edit the Python file, code review, merge, redeploy to all stations, re-run affected tests. The change is buried in a assert statement somewhere.
In Maestro: edit the low_limit and high_limit fields in the YAML file, commit, deploy the package. The limits are in the test definition, version-controlled, auditable, and separate from the measurement logic. Your test code returns a raw value. The engine decides pass or fail.
This separation matters in regulated environments (ISO 9001, IATF 16949) where limit changes need an audit trail. The YAML commit is the audit trail. There is no question about whether the code change and the limit change happened atomically — they are the same file, the same commit.
5. The Operator Is Not an Engineer
This is probably the biggest conceptual shift.
When you run a pytest suite, the person running it is a developer who is comfortable with a terminal. On a factory floor, the person running the test is an operator whose job is to plug in boards as fast as possible. They do not run commands. They do not read stack traces. They press buttons.
Maestro's execution model is built around this:
- Real-time Blazor UI streams each step's status, measurement values with limits, and log output to a browser page as the test runs — no polling, no refresh
- Prompt steps pause the sequence and display an instruction to the operator (with an image if needed), with configurable buttons and a timeout
- Required tags keep the Start button disabled until the operator has confirmed the serial number and product revision — so the test cannot start without traceability data
- Abort button triggers a graceful stop that still guarantees teardown steps run (
run_on_abort: true), so hardware is never left in a bad state
The output of a test run is not a terminal window. It is a permanent, searchable report page with every step, measurement, and log entry, accessible by anyone with a browser at any time.
6. MES Integration Is a First-Class Interface
If your organisation uses a Manufacturing Execution System (SAP ME, Siemens Opcenter, or anything custom), you know the pain: the MES is the authority on what is allowed to be tested, in what order, by whom. Feeding results back to the MES for routing advancement is manual work or a fragile custom script.
Maestro has a pluggable IMesService interface with three lifecycle hooks:
- Routing: When the operator scans a serial number, the MES returns which tests are allowed and what metadata to attach (work order, batch, product revision)
- Validation: Immediately before a test starts, the MES confirms this unit is allowed to proceed
- Reporting: When the test completes, the result is posted back to the MES for routing advancement
None of this is bolt-on. It is part of the execution model. The default implementation (AllowAllMesService) lets everything through — no MES, no change in behaviour. Drop in a real implementation and manufacturing routing works without touching any test code.
Results that could not be reported (MES down, network issue) are queued to a MesResultQueue table and retried automatically.
7. Multi-Language Orchestration Is Structural, Not Accidental
pytest is Python. Robot Framework is Robot. If you need to call a .NET instrument SDK that has no Python binding, you write a subprocess wrapper, a socket server, or you give up.
Maestro's execution model is language-neutral at the orchestration level. A single test sequence can call:
- A .NET method in step 1 (to talk to an instrument via a Windows-only SDK)
- A Python function in step 2 (to run analysis or talk to a Linux-based device)
- Another .NET method in step 3
Variables produced by step 1 are available to step 2. This is not a workaround — it is the design. The runners communicate with the API via gRPC. The variable store is Redis. Cross-language variable passing is sub-millisecond and requires no user code.
8. Package Management at the Scale of a Factory
When you have 50 test packages deployed across 12 stations, the question "what version is running on ST-07 right now?" starts to matter. pytest does not have an answer. A Git repo per test and manual deployment is the typical approach.
Maestro has a Git-backed registry:
- A central catalog repository lists all available test packages with their lifecycle status (NotReleased, Evaluation, Released, Obsolete)
- Each package is its own Git repository
- The Package Manager UI handles pull, install, and activation per station
- Every test execution record stores the Git commit hash of the installed package
"What version of PowerTest was running when UNIT-4421 failed?" is answered by a database lookup, not a conversation with the engineer who last deployed.
9. Zero Central Server: Each Station Is Autonomous
A CI/CD system like Jenkins has a central server. If it goes down, nothing runs. If the network is slow, jobs queue. This is acceptable for software testing.
On a factory floor, a network outage at 2am must not stop production. Maestro's architecture reflects this:
- Each test station runs its own local API instance with its own Redis
- The only shared infrastructure is PostgreSQL (for result storage)
- A station can execute tests with the network fully down; only result reporting is delayed
- Results queue locally and sync when connectivity is restored
Scale is horizontal: add a station, add an API instance. No central bottleneck.
10. The Dashboard Is Designed for a Factory Wall
pytest produces JUnit XML. To get a yield trend, you parse XML into a DataFrame, plot it with matplotlib, and hope someone looks at it.
Maestro's production dashboard is designed to be displayed on a monitor at the end of a production line, readable from five metres:
- Station cards showing current test, recent verdicts (color-coded dots), yield percentage, and idle detection
- Yield timeline with a configurable target line — turns red when below target
- Top failing steps (Pareto chart) showing which step is causing the most failures right now
- Live data — currently running tests detected via
EndTime IS NULLquery against PostgreSQL
URL-driven filters let supervisors bookmark views for specific product lines or shifts: /dashboard?stations=ST-01,ST-02&tests=PowerTest&hours=8.
No D3.js charts to maintain. No Jenkins plugin to configure. The dashboard exists because the data is already in the database.
What Maestro Does Not Replace
To be precise about scope:
- Unit and integration testing of your test code itself: pytest is still the right tool. Maestro has no opinion on how you test the Python or C# functions that drive your instruments. Use pytest for that.
- CI/CD pipelines: Maestro does not replace your build server. It has a REST API (
POST /api/testexecution/run) that lets your CI pipeline trigger tests on real hardware when available, but it does not replace Jenkins or GitHub Actions. - Simulator-only development: If you are developing test logic without access to hardware,
type: mocksteps let you stub measurements and iterate on YAML structure — but this is a development aid, not a simulation environment.
Summary
| Problem | Your Current Solution | Maestro |
|---|---|---|
| Different instrument addresses per station | Environment variables, branches, config files | Two-tier DB-backed cfg.* injection |
| Traceability (serial, operator, version) | Custom logging, external tools | Structural — always present on every execution |
| Numeric measurement storage | Log files, parse later | Relational rows, query immediately |
| Limit changes | Edit code, redeploy | Edit YAML, redeploy (no code change) |
| Operator-facing UI | Not your problem | First-class streaming Blazor UI |
| MES integration | Custom scripts, fragile | Pluggable interface with queued retry |
| Multi-language test orchestration | Subprocess wrappers, socket servers | gRPC runners, variable flow between languages |
| Package versioning at scale | "Ask the last person who deployed" | Git commit hash on every execution record |
| Station autonomy | Depends on CI server | Local API per station, PostgreSQL shared |
| Production dashboard | Parse XML, plot in Python | Built-in, SQL-backed, designed for factory walls |
If you are writing tests that run on a developer laptop to verify a Python library, use pytest. If you are writing tests that run on a factory floor to verify a physical board, Maestro is solving the problems you currently work around.