Live benchmark replay
Omar Benchmark Replay
Run live baseline vs Omar/RCC benchmark checks from the public harness. Start with the free BBEH smoke lane or use your own key for deeper runs.
Recent public live checks
Live runs from the public server. Small-n runs are smoke signals, not board-depth proof.
Smoke result — small samples can swing.
Smoke / live path check
Smoke / reconstructed
Smoke / reconstructed
Directional / gate missing
Smoke / reconstructed
Smoke / reconstructed
Smoke / reconstructed
Smoke / reconstructed
Drift-watch / weak positive
Artifact-backed board context
Historical/core benchmark results used as context. Public live runs may differ because live model outputs, reconstructed lanes, and gate availability vary by benchmark.
Historical context only, not current public proof
Historical context only, not current public proof
Historical context only, not current public proof
Historical context only, not current public proof
Historical context only, not current public proof
Historical context only, not current public proof
Historical context only, not current public proof
Historical context only, not current public proof
Historical context only, not current public proof
Historical context only, not current public proof
Historical context only, not current public proof
Historical context only, not current public proof
Historical context only, not current public proof
Historical context only, not current public proof
Free BBEH smoke
Run Free BBEH 20-Sample Live Smoke without an API key.
Small sample, live output, not board-depth proof.
BYOK live runs
Run current public benchmark lanes with your own API key.
Your key is used only for the selected run and is not stored.
Private Technical Pilot
For technical teams building agents, evals, and reasoning-heavy workflows. Omar/RCC is tested against your workflow and returned as a technical report with raw outputs, route logs, and reproducibility artifacts.