Two implementations that have to agree to the byte
Maintaining a cryptographic provenance protocol with a Rust canonical core and a Python reference, where bit-identical conformance is not a nice-to-have. It is the spec.
Maintaining a cryptographic provenance protocol with a Rust canonical core and a Python reference, where bit-identical conformance is not a nice-to-have. It is the spec.
Data leaks usually get detected long after the fact, and attribution is rarely possible. That is the gap Oversight Protocol exists to close. It seals a document to named recipients, logs a transparency attestation, and embeds multi-layer watermarks so a leaked copy can be traced back to the specific recipient who held it, without the document ever phoning home. I am the lead maintainer, and most of what I have learned maintaining it comes from a single hard constraint I imposed early.
There are two reference implementations. A canonical Rust workspace and a Python implementation. They have to produce bit-identical output, proven by cross-language conformance tests. A Python-sealed envelope must open in Rust and a Rust-sealed one must open in Python. That requirement governs every decision, and it is the most disciplined I have ever had to be about an interface.
The rule is absolute: RustCrypto and vetted library primitives only, no custom constructions in the stack. The classical operations run on the standard library bindings; the post-quantum hooks use audited implementations of ML-KEM-768 and ML-DSA-65. I am not smart enough to invent cryptography that is safe, and neither is almost anyone, and the cryptographers who are do not do it in a side project's commit history. Every primitive in Oversight is one that people far more careful than me have already broken and rebuilt.
The protocol is hybrid post-quantum from day one. Classical X25519 and Ed25519 run side by side with the post-quantum KEM and signature schemes. If one family falls, the other still holds. Building that in from the start was cheaper than retrofitting it later, and the threat it guards against is patient by nature.
The hardest engineering in the project is not the cryptography. It is making two languages agree exactly. When the Rust core grew its half of the hybrid DEK wrap, the test was not does it work. The test was does a Python-wrapped envelope open in Rust and vice versa, with the same key derivation inputs in the same order, the same HKDF salt and info, the same AEAD additional data, the same hex JSON shape. Any difference, even in field ordering or a canonicalization edge case, and the two implementations silently diverge into incompatible dialects of the same protocol.
This is where most of the bugs hid. Not in the math, in the seams: a canonicalization that ordered keys differently, an inline policy check on one open path that the other path skipped. The conformance harness is what surfaces them, and I have come to treat a failing parity test as more valuable than a passing feature test. A feature test tells me my code works. A parity test tells me the spec is real.
One decision I will not move on: the watermarking is passive. No beacons, no callbacks, no remote access. A sealed file is inert. It does not reach out, it does not check in, it does not open a channel anyone could abuse. The leak is detected by inspecting a recovered copy against the transparency log, not by the file calling home.
That constraint costs me features people occasionally ask for. It is also the difference between a provenance tool and a surveillance tool, and I would rather ship the smaller honest thing. Transparency comes from Sigstore Rekor v2, with RFC 3161 qualified timestamps as a fallback proof path, so the attestation is publicly auditable without trusting me or any single server. The whole design is built so that nobody has to take my word for anything.