Skip to content

Commit a3efc8d

Browse files
committed
Flesh out docs.rs documentation for the Rust crate
The crate-level page (lib.rs) is now a proper landing page: overview, threat model, architecture table pointing into each module, failure-mode table, quick start, and links to the wire-format spec, the companion preprint, and the Python and TypeScript ports. Each module gets a real intro plus runnable doctest examples: attestation.rs to_json/from_json round-trip hash.rs NFC normalization for hash_text; canonical f32 endianness for canonical_vector_bytes; sha256:<hex> shape for hash_vector signer.rs basic generate+pin; deterministic pin_with_options using PinOptions for explicit dtype + timestamp verifier.rs full verify, signature-only, vector tamper caught as VerifyError::VectorTampered, key rotation Doctests grew from 1 → 12, all green. Strict doc build is also green: RUSTDOCFLAGS="-D missing_docs \ -D rustdoc::broken_intra_doc_links \ -D rustdoc::missing_crate_level_docs" Cargo.toml gains [package.metadata.docs.rs] with all-features = true so docs.rs always builds the full surface. lib.rs adds the matching warn lints so future doc rot is caught locally. The crates.io README (rust/vectorpin/README.md) is rewritten to match the docs.rs landing-page tone: badges, what gets pinned, failure-mode table, threat model summary, and pointers into docs.rs and the spec.
1 parent 850e6e4 commit a3efc8d

7 files changed

Lines changed: 422 additions & 67 deletions

File tree

rust/vectorpin/Cargo.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,3 +33,7 @@ default = []
3333
[[bench]]
3434
name = "perf"
3535
harness = false
36+
37+
# docs.rs builds the documentation with these settings.
38+
[package.metadata.docs.rs]
39+
all-features = true

rust/vectorpin/README.md

Lines changed: 75 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,58 +1,97 @@
11
# vectorpin
22

3-
Rust port of [VectorPin](https://vectorpin.org) — verifiable integrity for AI embedding stores.
3+
[![Crates.io](https://img.shields.io/crates/v/vectorpin.svg)](https://crates.io/crates/vectorpin)
4+
[![Docs.rs](https://docs.rs/vectorpin/badge.svg)](https://docs.rs/vectorpin)
5+
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0)
46

5-
Part of the [ThirdKey](https://thirdkey.ai) Trust Stack.
7+
Verifiable integrity for AI embedding stores. Rust reference implementation of the [VectorPin](https://github.com/ThirdKeyAI/VectorPin) attestation protocol.
68

7-
## Why a Rust crate
9+
VectorPin pins each embedding to its source content and the producing model via an Ed25519 signature over a canonical byte representation. Any post-pinning modification of the vector or source text — including covert [steganographic exfiltration attacks](https://doi.org/10.5281/zenodo.20058256) that current vector databases ingest without complaint — breaks signature verification on read.
810

9-
Symbiont — the Rust-native policy-governed agent runtime in the Trust Stack — needs to verify VectorPin attestations in-process, without a Python sidecar. This crate is the canonical Rust implementation and is bit-for-bit compatible with the Python reference: the same protocol v1 wire format, the same canonical bytes, the same Ed25519 signatures.
11+
This crate is **byte-for-byte compatible** with the Python reference (`pip install vectorpin`) and the TypeScript reference (`npm install vectorpin`). A pin produced by any of the three implementations verifies on the other two; the contract is enforced by shared test vectors that every port consumes in CI.
1012

11-
Cross-language compatibility is enforced by the test vectors in `../../testvectors/`, which both the Python and Rust test suites consume.
13+
Part of the [ThirdKey](https://thirdkey.ai) Trust Stack, alongside [Symbiont](https://github.com/ThirdKeyAI/Symbiont)the Rust-native policy-governed agent runtime that consumes these attestations in-process without a Python sidecar.
1214

1315
## Quick start
1416

15-
Add to your `Cargo.toml`:
16-
1717
```toml
1818
[dependencies]
1919
vectorpin = "0.1"
2020
```
2121

2222
```rust
23-
use vectorpin::{Signer, Verifier, VerifyError};
24-
25-
fn main() -> anyhow::Result<()> {
26-
let signer = Signer::generate("prod-2026-05".to_string());
27-
28-
// Some embedding produced by a model
29-
let embedding: Vec<f32> = my_model_embed("The quick brown fox.");
30-
31-
let pin = signer.pin(
32-
"The quick brown fox.",
33-
"text-embedding-3-large",
34-
&embedding,
35-
)?;
36-
37-
// Store pin.to_json() alongside the embedding in your vector DB.
38-
39-
let mut verifier = Verifier::new();
40-
verifier.add_key(signer.key_id(), signer.public_key_bytes());
41-
42-
let result = verifier.verify_full(
43-
&pin,
44-
Some("The quick brown fox."),
45-
Some(&embedding),
46-
None,
47-
);
48-
assert!(result.is_ok());
49-
Ok(())
50-
}
23+
use vectorpin::{Signer, Verifier};
24+
25+
// Ingestion: produce an embedding, sign a pin for it.
26+
let signer = Signer::generate("prod-2026-05".to_string());
27+
let embedding: Vec<f32> = my_model_embed("The quick brown fox.");
28+
29+
let pin = signer.pin(
30+
"The quick brown fox.",
31+
"text-embedding-3-large",
32+
embedding.as_slice(),
33+
)?;
34+
35+
// Persist `pin.to_json()` alongside the embedding in your vector DB.
36+
let stored: String = pin.to_json();
37+
38+
// Read/audit: parse the stored JSON and verify against ground truth.
39+
let parsed = vectorpin::Pin::from_json(&stored)?;
40+
let mut verifier = Verifier::new();
41+
verifier.add_key(signer.key_id(), signer.public_key_bytes());
42+
43+
let result = verifier.verify_full(
44+
&parsed,
45+
Some("The quick brown fox."),
46+
Some(embedding.as_slice()),
47+
None,
48+
);
49+
assert!(result.is_ok());
5150
```
5251

52+
## What gets pinned
53+
54+
Each `Pin` commits to:
55+
56+
- **The source text**, by SHA-256 of UTF-8 NFC-normalized bytes.
57+
- **The model**, by identifier (and optionally by content hash).
58+
- **The vector itself**, by SHA-256 of canonical little-endian f32/f64 bytes.
59+
- **The producer**, by Ed25519 signing-key identifier (`kid`).
60+
- **The time**, by RFC 3339 timestamp.
61+
62+
Verification distinguishes failure modes via the `VerifyError` enum so callers can route them differently:
63+
64+
| Variant | Meaning |
65+
|---|---|
66+
| `SignatureInvalid` | Pin was forged or re-signed by an attacker |
67+
| `VectorTampered` | Embedding modified after pinning — the steganography kill shot |
68+
| `SourceMismatch` | Source text differs from what was pinned |
69+
| `ModelMismatch` | Pin produced by a different embedding model than expected |
70+
| `UnknownKey` | Pin signed by a key not in the verifier's registry |
71+
| `UnsupportedVersion` | Protocol version mismatch |
72+
| `ShapeMismatch` | Supplied vector's dim disagrees with the pin header |
73+
74+
## Threat model
75+
76+
VectorPin is designed against an attacker who can:
77+
78+
- Modify vectors after they are produced — via a poisoned ingestion pipeline, a compromised vector DB, or backup-level access.
79+
- See the public verification key but not the private signing key.
80+
- Replay or selectively delete pins.
81+
82+
It does **not** defend against:
83+
84+
- An attacker with the private signing key (key custody is the user's responsibility).
85+
- An attacker who modifies the source documents *before* embedding (use upstream content integrity controls).
86+
- An attacker who uses a legitimate signing key to attest a malicious vector at ingestion time (use upstream input validation).
87+
88+
For the empirical evaluation of the attack class VectorPin is built to defeat, see the companion preprint at <https://doi.org/10.5281/zenodo.20058256>.
89+
5390
## Status
5491

55-
Alpha. Protocol v1 stable; covered by the cross-language test vectors.
92+
Alpha (`v0.1`). Protocol v1 stable; covered by the cross-language test vectors. The wire format will not break compatibility without a major-version bump.
93+
94+
See the full [protocol specification](https://github.com/ThirdKeyAI/VectorPin/blob/main/docs/spec.md) and [docs.rs](https://docs.rs/vectorpin) for the complete API reference.
5695

5796
## License
5897

rust/vectorpin/src/attestation.rs

Lines changed: 61 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,41 @@
11
// Copyright 2025 Jascha Wanger / Tarnover, LLC
22
// SPDX-License-Identifier: Apache-2.0
33

4-
//! Pin attestation format and canonicalization.
4+
//! Pin attestation data structures and canonical serialization.
55
//!
6-
//! See `docs/spec.md` in the repo root for the protocol specification.
7-
//! In summary: a [`Pin`] is a JSON object with a header (the signed
8-
//! payload) plus a key id and signature. The header canonicalizes to a
9-
//! deterministic byte sequence — sorted keys, no whitespace — that
10-
//! both Python and Rust implementations agree on byte-for-byte.
6+
//! A [`Pin`] is a JSON object with a header (the signed payload) plus a
7+
//! key id and a signature. The header canonicalizes to a deterministic
8+
//! byte sequence — sorted keys, no whitespace, raw UTF-8 (non-ASCII
9+
//! is *not* escaped to `\uXXXX`) — that the Python, Rust, and
10+
//! TypeScript reference implementations agree on byte-for-byte.
11+
//!
12+
//! That deterministic byte sequence is what gets signed by Ed25519, not
13+
//! the JSON wire form. Re-serializing a pin (different whitespace,
14+
//! different key order) therefore does *not* invalidate the signature
15+
//! as long as the canonical form is recoverable.
16+
//!
17+
//! For the full wire-format specification — every field, every supported
18+
//! dtype, the exact canonicalization algorithm — see
19+
//! [`docs/spec.md`](https://github.com/ThirdKeyAI/VectorPin/blob/main/docs/spec.md).
20+
//!
21+
//! # Example
22+
//!
23+
//! ```
24+
//! use vectorpin::{Pin, Signer};
25+
//!
26+
//! let signer = Signer::generate("demo".to_string());
27+
//! let v: Vec<f32> = vec![1.0, 2.0, 3.0];
28+
//! let pin = signer.pin("hello", "test-model", v.as_slice()).unwrap();
29+
//!
30+
//! // Compact JSON for storage in your vector DB metadata.
31+
//! let json: String = pin.to_json();
32+
//! assert!(!json.contains(": "));
33+
//! assert!(!json.contains(", "));
34+
//!
35+
//! // Round-trip through wire form preserves the pin exactly.
36+
//! let parsed = Pin::from_json(&json).unwrap();
37+
//! assert_eq!(pin, parsed);
38+
//! ```
1139
1240
use std::collections::BTreeMap;
1341

@@ -21,8 +49,14 @@ pub const PROTOCOL_VERSION: u32 = 1;
2149
/// The signed portion of a [`Pin`].
2250
///
2351
/// Two pins are considered equivalent iff their headers canonicalize to
24-
/// identical bytes. Optional fields (`model_hash`, `extra`) are omitted
25-
/// from the canonical form when unset, never written as `null`.
52+
/// identical bytes. Optional fields ([`model_hash`](Self::model_hash),
53+
/// [`extra`](Self::extra)) are omitted from the canonical form when
54+
/// unset, never written as `null` — this matters because adding a
55+
/// `null` would change the byte sequence the signature commits to.
56+
///
57+
/// You normally do not construct `PinHeader` directly; obtain one from
58+
/// [`Signer::pin`](crate::Signer::pin) or
59+
/// [`Signer::pin_with_options`](crate::signer::Signer::pin_with_options).
2660
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
2761
pub struct PinHeader {
2862
/// Protocol version. Must equal [`PROTOCOL_VERSION`].
@@ -99,8 +133,25 @@ impl PinHeader {
99133
}
100134
}
101135

102-
/// A signed VectorPin attestation. Serialize with [`Pin::to_json`] and
103-
/// store alongside the embedding in vector store metadata.
136+
/// A signed VectorPin attestation.
137+
///
138+
/// Serialize with [`Pin::to_json`] and store the resulting string
139+
/// alongside the embedding in vector-store metadata. On read, parse
140+
/// with [`Pin::from_json`] and hand to [`Verifier::verify_full`](crate::Verifier::verify_full).
141+
///
142+
/// # Example
143+
///
144+
/// ```
145+
/// use vectorpin::{Pin, Signer, Verifier};
146+
///
147+
/// let signer = Signer::generate("k1".to_string());
148+
/// let v: Vec<f32> = vec![1.0, 2.0, 3.0];
149+
/// let pin = signer.pin("hello", "m", v.as_slice()).unwrap();
150+
///
151+
/// let mut verifier = Verifier::new();
152+
/// verifier.add_key(signer.key_id(), signer.public_key_bytes());
153+
/// assert!(verifier.verify_signature(&Pin::from_json(&pin.to_json()).unwrap()).is_ok());
154+
/// ```
104155
#[derive(Debug, Clone, PartialEq, Eq)]
105156
pub struct Pin {
106157
/// The signed payload.

rust/vectorpin/src/hash.rs

Lines changed: 47 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,43 @@
55
//!
66
//! These three operations are the only places in the protocol where
77
//! semantic content gets turned into bytes. Any disagreement between
8-
//! Python and Rust here breaks cross-language verification, so the
9-
//! semantics are pinned down explicitly:
8+
//! the Python, Rust, and TypeScript ports here breaks cross-language
9+
//! verification, so the semantics are pinned down explicitly:
1010
//!
1111
//! * Vectors: little-endian, 1-D, packed `f32` or `f64` bytes.
1212
//! * Text: UTF-8 of the NFC-normalized string.
1313
//! * Output digests: prefixed with `"sha256:"` and lowercase hex.
14+
//!
15+
//! Cross-language byte-for-byte parity for the functions in this module
16+
//! is asserted by `tests/cross_lang.rs` against the shared fixtures in
17+
//! [`testvectors/`](https://github.com/ThirdKeyAI/VectorPin/tree/main/testvectors).
18+
//!
19+
//! # Examples
20+
//!
21+
//! Hashing source text is NFC-normalized so that visually identical
22+
//! strings stored in different Unicode forms hash equal:
23+
//!
24+
//! ```
25+
//! use vectorpin::hash_text;
26+
//!
27+
//! let composed = "caf\u{00e9}"; // 'é' as one codepoint (NFC)
28+
//! let decomposed = "cafe\u{0301}"; // 'e' + combining acute (NFD)
29+
//! assert_eq!(hash_text(composed), hash_text(decomposed));
30+
//! assert!(hash_text("hello").starts_with("sha256:"));
31+
//! ```
32+
//!
33+
//! Hashing a vector requires the dtype the caller wants to commit to.
34+
//! The same numeric values hashed under f32 and f64 produce different
35+
//! digests, by design — the dtype is part of the signed contract:
36+
//!
37+
//! ```
38+
//! use vectorpin::{hash_vector, hash::VectorRef, VecDtype};
39+
//!
40+
//! let v: Vec<f32> = vec![0.1, 0.2, 0.3];
41+
//! let h32 = hash_vector(VectorRef::F32(&v), VecDtype::F32);
42+
//! assert!(h32.starts_with("sha256:"));
43+
//! assert_eq!(h32.len(), "sha256:".len() + 64);
44+
//! ```
1445
1546
use sha2::{Digest, Sha256};
1647
use unicode_normalization::UnicodeNormalization;
@@ -117,8 +148,20 @@ impl<'a> From<&'a [f64]> for VectorRef<'a> {
117148
/// Reproducible byte form of an embedding vector.
118149
///
119150
/// Always little-endian, always packed, always under the dtype
120-
/// requested by the caller. Two implementations must agree on these
121-
/// bytes byte-for-byte for cross-language verification to work.
151+
/// requested by the caller. The Python, Rust, and TypeScript ports
152+
/// must agree on these bytes byte-for-byte for cross-language
153+
/// verification to work.
154+
///
155+
/// # Example
156+
///
157+
/// ```
158+
/// use vectorpin::{canonical_vector_bytes, hash::VectorRef, VecDtype};
159+
///
160+
/// let v = [1.0_f32];
161+
/// let bytes = canonical_vector_bytes(VectorRef::F32(&v), VecDtype::F32);
162+
/// // 1.0_f32 in IEEE-754 little-endian.
163+
/// assert_eq!(bytes, [0x00, 0x00, 0x80, 0x3f]);
164+
/// ```
122165
pub fn canonical_vector_bytes(vector: VectorRef<'_>, dtype: VecDtype) -> Vec<u8> {
123166
match (vector, dtype) {
124167
(VectorRef::F32(v), VecDtype::F32) => f32_le_bytes(v),

0 commit comments

Comments
 (0)