product: maestro audience: test-developer, ai-assistant authority: normative

Maestro for Engineers Who Already Know pytest or Robot Framework

You know how to write tests. You have coverage reports, CI pipelines, and fixture systems you actually understand. So when someone tells you to look at "yet another test framework", your instinct is right to be skeptical.

This document is not for beginners. It is for engineers who have shipped real test suites with pytest or Robot Framework and want a straight answer to: what problem does Maestro actually solve that I don't already have a solution for?

The short answer: pytest and Robot Framework are excellent tools for testing software. Maestro is built for testing hardware on a factory floor — and that turns out to be a fundamentally different problem.

The Core Problem: Software Testing vs. Manufacturing Testing

When you test software, you own the entire environment. You mock external dependencies, parametrize inputs, and the "device under test" is a process you spin up and kill.

Manufacturing test is different in five ways that matter:

Concern	Software testing	Manufacturing testing
Device under test	A process you control	Physical hardware, one of thousands
Environment	Identical across CI agents	Different instruments, addresses, wiring per station
Identity	Doesn't matter	Serial number, batch, routing history must be traceable
Operators	Engineers	Factory workers who do not run commands
Results	Pass/fail for the next commit	Permanent records with numeric values for SPC/yield analysis

pytest and Robot Framework solve the first column well. Maestro is designed for the second.

1. Station Configuration: The Problem You're Currently Solving Badly

If you have deployed pytest-based hardware tests across more than one station, you have almost certainly done one of these:

Hard-coded instrument addresses and then maintained separate branches per station
Used environment variables and added a provisioning script that is nobody's responsibility
Written a conftest.py that reads from a config file that lives outside version control

Maestro's approach: A PostgreSQL-backed two-tier key-value store, injected automatically into every test step as cfg.* variables.

# This YAML is identical on every station
parameters:
  address: "{{cfg.DMM_VISA}}"

Station ST-01 has DMM_VISA = TCPIP::192.168.1.50::INSTR. Station ST-02 has DMM_VISA = TCPIP::192.168.1.51::INSTR. The YAML never changes. The test code never changes. The per-station difference lives exactly where it belongs: in station configuration, managed through the UI, version-audited in the database.

Every test execution stores a JSONB snapshot of the merged config. Six months from now, "what settings were active on ST-02 when UNIT-4421 failed?" is a SQL query, not a forensic investigation.

2. Traceability Is Not an Afterthought

In pytest, traceability is something you bolt on: write serial numbers to a log file, parse the log file, insert into a spreadsheet. Or you write a plugin. Or you use a paid test management tool that ingests JUnit XML.

In Maestro, traceability is structural. Before a test starts, the system knows:

Who started it (exec.operator_id)
Which station it is running on (exec.station_id, cfg.station_id)
Which device is being tested (exec.serial_number)
Which version of the test package is running (Git commit hash, stored on every execution record)

These are not environment variables you need to remember to set. They arrive as first-class parameters in every step, alongside your cfg.* variables. Your test code just uses them:

def program_eeprom(serial_number: str, product_id: str) -> None:
    # serial_number came from exec.serial_number — always present, always correct

The question "which version of the test was running when UNIT-4421 failed?" is answered by looking up TestExecution.TestPackageCommitHash in the database and checking it against the Git repository. No tooling required.

3. Measurements Are Data, Not Strings

This is the one that stings the most when you realise it.

In pytest, a measurement looks like this:

voltage = measure_rail("3V3")
assert 3.135 <= voltage <= 3.465, f"VCORE out of range: {voltage}"

That assert produces a pass or a fail and a string. The numeric value voltage does not exist anywhere after the test completes, unless you explicitly write it somewhere.

In Maestro:

measurement:
  name: "RAIL_3V3"
  value: "{{measured_voltage}}"
  low_limit: 3.135
  high_limit: 3.465
  unit: "V"

The engine stores name, value, low_limit, high_limit, unit, verdict, timestamp, station_id, and serial_number as a row in a relational measurements table — not in a log, not in a blob.

What this enables, natively, in SQL:

-- Did RAIL_3V3 drift on Thursday?
SELECT DATE_TRUNC('hour', timestamp) AS hour,
       AVG(value), MIN(value), MAX(value)
FROM measurements
WHERE measurement_name = 'RAIL_3V3'
  AND timestamp BETWEEN '2026-04-10' AND '2026-04-11'
GROUP BY 1 ORDER BY 1;

-- 3-sigma limits for current production run
SELECT AVG(value) - 3 * STDDEV(value) AS lower_3sigma,
       AVG(value) + 3 * STDDEV(value) AS upper_3sigma
FROM measurements
WHERE measurement_name = 'VCORE' AND timestamp > NOW() - INTERVAL '7 days';

No post-processing. No log parsing. No Excel. The data is there because the engine was designed to put it there.

4. Limits Do Not Live in Code

In your current setup, if a test limit changes — say the acceptable voltage range tightens from ±5% to ±3% — what does that change involve?

In pytest: edit the Python file, code review, merge, redeploy to all stations, re-run affected tests. The change is buried in a assert statement somewhere.

In Maestro: edit the low_limit and high_limit fields in the YAML file, commit, deploy the package. The limits are in the test definition, version-controlled, auditable, and separate from the measurement logic. Your test code returns a raw value. The engine decides pass or fail.

This separation matters in regulated environments (ISO 9001, IATF 16949) where limit changes need an audit trail. The YAML commit is the audit trail. There is no question about whether the code change and the limit change happened atomically — they are the same file, the same commit.

5. The Operator Is Not an Engineer

This is probably the biggest conceptual shift.

When you run a pytest suite, the person running it is a developer who is comfortable with a terminal. On a factory floor, the person running the test is an operator whose job is to plug in boards as fast as possible. They do not run commands. They do not read stack traces. They press buttons.

Maestro's execution model is built around this:

Real-time Blazor UI streams each step's status, measurement values with limits, and log output to a browser page as the test runs — no polling, no refresh
Prompt steps pause the sequence and display an instruction to the operator (with an image if needed), with configurable buttons and a timeout
Required tags keep the Start button disabled until the operator has confirmed the serial number and product revision — so the test cannot start without traceability data
Abort button triggers a graceful stop that still guarantees teardown steps run (run_on_abort: true), so hardware is never left in a bad state

The output of a test run is not a terminal window. It is a permanent, searchable report page with every step, measurement, and log entry, accessible by anyone with a browser at any time.

6. MES Integration Is a First-Class Interface

If your organisation uses a Manufacturing Execution System (SAP ME, Siemens Opcenter, or anything custom), you know the pain: the MES is the authority on what is allowed to be tested, in what order, by whom. Feeding results back to the MES for routing advancement is manual work or a fragile custom script.

Maestro has a pluggable IMesService interface with three lifecycle hooks:

Routing: When the operator scans a serial number, the MES returns which tests are allowed and what metadata to attach (work order, batch, product revision)
Validation: Immediately before a test starts, the MES confirms this unit is allowed to proceed
Reporting: When the test completes, the result is posted back to the MES for routing advancement

None of this is bolt-on. It is part of the execution model. The default implementation (AllowAllMesService) lets everything through — no MES, no change in behaviour. Drop in a real implementation and manufacturing routing works without touching any test code.

Results that could not be reported (MES down, network issue) are queued to a MesResultQueue table and retried automatically.

7. Multi-Language Orchestration Is Structural, Not Accidental

pytest is Python. Robot Framework is Robot. If you need to call a .NET instrument SDK that has no Python binding, you write a subprocess wrapper, a socket server, or you give up.

Maestro's execution model is language-neutral at the orchestration level. A single test sequence can call:

A .NET method in step 1 (to talk to an instrument via a Windows-only SDK)
A Python function in step 2 (to run analysis or talk to a Linux-based device)
Another .NET method in step 3

Variables produced by step 1 are available to step 2. This is not a workaround — it is the design. The runners communicate with the API via gRPC. The variable store is Redis. Cross-language variable passing is sub-millisecond and requires no user code.

8. Package Management at the Scale of a Factory

When you have 50 test packages deployed across 12 stations, the question "what version is running on ST-07 right now?" starts to matter. pytest does not have an answer. A Git repo per test and manual deployment is the typical approach.

Maestro has a Git-backed registry:

A central catalog repository lists all available test packages with their lifecycle status (NotReleased, Evaluation, Released, Obsolete)
Each package is its own Git repository
The Package Manager UI handles pull, install, and activation per station
Every test execution record stores the Git commit hash of the installed package

"What version of PowerTest was running when UNIT-4421 failed?" is answered by a database lookup, not a conversation with the engineer who last deployed.

9. Zero Central Server: Each Station Is Autonomous

A CI/CD system like Jenkins has a central server. If it goes down, nothing runs. If the network is slow, jobs queue. This is acceptable for software testing.

On a factory floor, a network outage at 2am must not stop production. Maestro's architecture reflects this:

Each test station runs its own local API instance with its own Redis
The only shared infrastructure is PostgreSQL (for result storage)
A station can execute tests with the network fully down; only result reporting is delayed
Results queue locally and sync when connectivity is restored

Scale is horizontal: add a station, add an API instance. No central bottleneck.

10. The Dashboard Is Designed for a Factory Wall

pytest produces JUnit XML. To get a yield trend, you parse XML into a DataFrame, plot it with matplotlib, and hope someone looks at it.

Maestro's production dashboard is designed to be displayed on a monitor at the end of a production line, readable from five metres:

Station cards showing current test, recent verdicts (color-coded dots), yield percentage, and idle detection
Yield timeline with a configurable target line — turns red when below target
Top failing steps (Pareto chart) showing which step is causing the most failures right now
Live data — currently running tests detected via EndTime IS NULL query against PostgreSQL

URL-driven filters let supervisors bookmark views for specific product lines or shifts: /dashboard?stations=ST-01,ST-02&tests=PowerTest&hours=8.

No D3.js charts to maintain. No Jenkins plugin to configure. The dashboard exists because the data is already in the database.

What Maestro Does Not Replace

To be precise about scope:

Unit and integration testing of your test code itself: pytest is still the right tool. Maestro has no opinion on how you test the Python or C# functions that drive your instruments. Use pytest for that.
CI/CD pipelines: Maestro does not replace your build server. It has a REST API (POST /api/testexecution/run) that lets your CI pipeline trigger tests on real hardware when available, but it does not replace Jenkins or GitHub Actions.
Simulator-only development: If you are developing test logic without access to hardware, type: mock steps let you stub measurements and iterate on YAML structure — but this is a development aid, not a simulation environment.

Summary

Problem	Your Current Solution	Maestro
Different instrument addresses per station	Environment variables, branches, config files	Two-tier DB-backed `cfg.*` injection
Traceability (serial, operator, version)	Custom logging, external tools	Structural — always present on every execution
Numeric measurement storage	Log files, parse later	Relational rows, query immediately
Limit changes	Edit code, redeploy	Edit YAML, redeploy (no code change)
Operator-facing UI	Not your problem	First-class streaming Blazor UI
MES integration	Custom scripts, fragile	Pluggable interface with queued retry
Multi-language test orchestration	Subprocess wrappers, socket servers	gRPC runners, variable flow between languages
Package versioning at scale	"Ask the last person who deployed"	Git commit hash on every execution record
Station autonomy	Depends on CI server	Local API per station, PostgreSQL shared
Production dashboard	Parse XML, plot in Python	Built-in, SQL-backed, designed for factory walls

If you are writing tests that run on a developer laptop to verify a Python library, use pytest. If you are writing tests that run on a factory floor to verify a physical board, Maestro is solving the problems you currently work around.