docs: align taxonomy and report model with detector output

2026-07-01 14:28:55 -07:00 · 2026-02-27 02:41:05 -03:00
parent ce2476f6ca
commit 374e185ba1
3 changed files with 85 additions and 48 deletions
@@ -1,48 +1,40 @@
 # Stealth

-A privacy audit tool for Bitcoin wallets. Stealth analyzes the transaction history of a wallet descriptor and surfaces privacy vulnerabilities at the UTXO level.
+A privacy audit tool for Bitcoin wallets. Stealth analyzes the transaction history of a wallet descriptor and surfaces privacy findings from real on-chain heuristics.

 ## What it does

-Paste a Bitcoin wallet descriptor into the input screen and click **Analyze**. Stealth fetches the on-chain history for all addresses derived from that descriptor, then produces a report listing every UTXO in the wallet and the privacy flaws associated with each one.
+Paste a Bitcoin wallet descriptor into the input screen and click **Analyze**. Stealth derives addresses from the descriptor, scans wallet-related chain history, and returns a report with structured `findings` and `warnings`.

-## Vulnerabilities detected
+## Detection taxonomy (ground truth)

-### Address Reuse
-Detects when the same address has received more than one payment. Address reuse links multiple transactions to a single entity and permanently exposes the full balance history of that address to anyone inspecting the chain.
+Stealth's source-of-truth detector is [`backend/script/detect.py`](backend/script/detect.py). The frontend renders the `type` values emitted by that script.

-### CIOH (Common Input Ownership Heuristic)
-Detects transactions where inputs from multiple of your addresses were co-signed. This is the foundational clustering heuristic used by chain-analysis firms: it proves all co-signed inputs belong to the same wallet.
+### Finding types

-### Dust Attack
-Identifies UTXOs that originated from dust — tiny amounts sent by a third party to track a wallet. When the user later spends that dust alongside their own coins, the inputs are merged and previously unconnected addresses are linked.
+| Type | Meaning |
+|---|---|
+| `ADDRESS_REUSE` | Address received funds in multiple transactions, linking history and balances. |
+| `CIOH` | Multi-input linkage (Common Input Ownership Heuristic) across co-spent inputs. |
+| `DUST` | Dust output detection (current or historical). |
+| `DUST_SPENDING` | Dust input spent with normal inputs, actively linking clusters. |
+| `CHANGE_DETECTION` | Change output appears trivially identifiable through heuristics. |
+| `CONSOLIDATION` | UTXO created from many-input consolidation transaction. |
+| `SCRIPT_TYPE_MIXING` | Mixed input script families in one spend (strong fingerprint). |
+| `CLUSTER_MERGE` | Inputs from previously separate funding chains merged in one tx. |
+| `UTXO_AGE_SPREAD` | Large age spread across UTXOs reveals dormancy/lookback patterns. |
+| `EXCHANGE_ORIGIN` | Probable exchange batch-withdrawal origin. |
+| `TAINTED_UTXO_MERGE` | Tainted and clean inputs merged, propagating taint. |
+| `BEHAVIORAL_FINGERPRINT` | Consistent transaction behavior reveals wallet/user fingerprint. |

-### Dust Spending
-Flags transactions that spend a dust UTXO together with normal-sized inputs, actively triggering the dust-tracking link.
+### Warning-only types

-### Change Output Detection
-Detects transactions where the change output is trivially identifiable through heuristics such as round-number payments, mismatched script types between change and payment, or use of the internal (BIP-44 `/1/*`) derivation path.
+| Type | Meaning |
+|---|---|
+| `DORMANT_UTXOS` | Dormant/aged UTXO pattern warning. |
+| `DIRECT_TAINT` | Direct receipt from a known risky source. |

-### UTXO Consolidation
-Flags UTXOs born from a consolidation transaction (many inputs, few outputs). Consolidation merges the histories of all input addresses into one UTXO, amplifying the privacy damage of every prior vulnerability.
-
-### Script Type Mixing
-Detects transactions that mix different input script types (e.g. P2PKH alongside P2WPKH). This is rare and highly identifying, shrinking the anonymity set significantly.
-
-### Cluster Merge
-Identifies transactions that merge UTXOs from different funding chains, linking independent coin histories and allowing chain-analysis firms to associate previously separate clusters.
-
-### UTXO Age Spread
-Flags wallets where unspent outputs have significantly different ages. A wide age spread can reveal hoarding patterns and help correlate activity across time periods.
-
-### Exchange Origin
-Detects UTXOs received from likely exchange batch withdrawals, identified by high output counts, many unique recipients, and a large input-to-output ratio.
-
-### Tainted UTXO Merge
-Flags transactions that spend tainted inputs (from known risky sources) alongside clean inputs, spreading taint to the clean coin history.
-
-### Behavioral Fingerprint
-Analyses patterns across all send transactions — fee rates, output counts, RBF signalling, locktime usage, round amounts, and script type consistency — to detect wallet software fingerprints that chain-analysis firms use for clustering.
+`severity` values are emitted as uppercase strings (for example `LOW`, `MEDIUM`, `HIGH`, and `CRITICAL`).

 ## How to use

@@ -51,8 +43,8 @@ Analyses patterns across all send transactions — fee rates, output counts, RBF
   - Supported formats: `wpkh(...)`, `pkh(...)`, `sh(wpkh(...))`, `tr(...)`, and multisig variants.
 3. Click **Analyze**.
 4. Review the results:
-   - A list of all UTXOs currently held by the wallet.
-   - For each UTXO, the privacy vulnerabilities detected in its history are highlighted.
+   - Summary counters for findings, warnings, and transactions analyzed.
+   - Collapsible finding/warning cards with type, severity, description, and structured evidence.

 ## Installation

@@ -87,7 +79,7 @@ Pass `--fresh` to wipe the chain and start from genesis.
 python3 reproduce.py
 ```

-This script sends transactions between the test wallets to reproduce all 12 privacy vulnerability types — address reuse, dust attacks, CIOH, consolidation, script-type mixing, and more. **The application will return no findings without this step**, since a freshly mined chain has no transaction history to analyse.
+This script sends transactions between the test wallets to reproduce all 12 detector finding types. **The application will return no findings without this step**, since a freshly mined chain has no transaction history to analyze.

 After it runs, get a descriptor to paste into the app:

@@ -121,7 +113,9 @@ Open `http://localhost:5173` in your browser.

 1. Paste a wallet descriptor into the input field (e.g. `wpkh([fp/84h/0h/0h]xpub.../0/*)`).
 2. Click **Analyze** — the frontend calls `GET /api/wallet/scan?descriptor=…` on the backend, which runs `detect.py` against your local regtest node.
-3. Review the findings: each entry shows the vulnerability type, severity, and a collapsible details panel.
+3. Review the report:
+   - `findings[]` and `warnings[]` entries each include `type`, `severity`, `description`, and optional `details`.
+   - The summary panel shows `findings`, `warnings`, and whether the scan is `clean`.

 ## Project structure

@@ -4,6 +4,42 @@ This project uses Quarkus, the Supersonic Subatomic Java Framework.

 If you want to learn more about Quarkus, please visit its website: <https://quarkus.io/>.

+## Stealth-specific notes
+
+### Scan endpoint
+
+The frontend calls:
+
+```http
+GET /api/wallet/scan?descriptor=<output-descriptor>
+```
+
+This endpoint executes the Python detector configured by `stealth.detect.script` (default: `../../script/detect.py`) and returns its JSON report verbatim.
+
+### Detector type taxonomy
+
+`detect.py` can emit the following finding `type` values:
+
+- `ADDRESS_REUSE`
+- `CIOH`
+- `DUST`
+- `DUST_SPENDING`
+- `CHANGE_DETECTION`
+- `CONSOLIDATION`
+- `SCRIPT_TYPE_MIXING`
+- `CLUSTER_MERGE`
+- `UTXO_AGE_SPREAD`
+- `EXCHANGE_ORIGIN`
+- `TAINTED_UTXO_MERGE`
+- `BEHAVIORAL_FINGERPRINT`
+
+Warning-only types:
+
+- `DORMANT_UTXOS`
+- `DIRECT_TAINT`
+
+Severity values are uppercase strings (for example: `LOW`, `MEDIUM`, `HIGH`, `CRITICAL`).
+
 ## Running the application in dev mode

 You can run your application in dev mode that enables live coding using:
@@ -61,9 +61,9 @@ A privacy audit tool that surfaces vulnerabilities at the UTXO level.
 - Supports `wpkh`, `pkh`, `sh(wpkh)`, `tr`, multisig

 **Output**
- Every UTXO listed
- Privacy flaws per UTXO
- Severity badges (high / medium / low)
+- Structured findings + warnings
+- Type/severity/description + evidence details
+- Severity badges mapped from detector output

 </div>

@@ -84,12 +84,19 @@ wpkh([xpub...]/0/*) → Analyze

 # Vulnerabilities Detected

-| Vulnerability | What it means |
-|---------------|---------------|
-| **Address Reuse** | Same address received >1 payment — links tx history, exposes balance |
-| **Dust Spend** | UTXO from dust attack — when spent, links previously unconnected addresses |
-| **UTXO Consolidation** | Multiple inputs merged — strong signal all belong to same wallet |
-| **CIOH** | Common Input Ownership Heuristic — chain analysis firms use this to cluster addresses |
+| Detector Type | Meaning |
+|---------------|---------|
+| `ADDRESS_REUSE` | Same address received multiple payments, linking history |
+| `CIOH` | Multi-input ownership clustering signal |
+| `DUST` / `DUST_SPENDING` | Dust detection and dust+normal co-spend linkage |
+| `CHANGE_DETECTION` | Payment/change outputs become easy to distinguish |
+| `CONSOLIDATION` / `CLUSTER_MERGE` | Input histories merged into one cluster |
+| `SCRIPT_TYPE_MIXING` | Mixed input script families create fingerprint |
+| `UTXO_AGE_SPREAD` | Old/new UTXO spread leaks dormancy patterns |
+| `EXCHANGE_ORIGIN` | Probable exchange batch-withdrawal origin |
+| `TAINTED_UTXO_MERGE` | Tainted + clean input merge propagates taint |
+| `BEHAVIORAL_FINGERPRINT` | Transaction style consistency re-identifies wallet |
+| Warnings: `DORMANT_UTXOS`, `DIRECT_TAINT` | Non-finding risk signals shown separately |

 ---

@@ -142,8 +149,8 @@ stealth/

 1. **Input screen** — paste descriptor, click Analyze
 2. **Loading** — fetches and analyzes
-3. **Report** — summary bar (total / vulnerable / clean) + UTXO cards
-4. Each card: address, amount, badges, expandable details
+3. **Report** — summary bar (findings / warnings / tx analyzed)
+4. Expandable finding cards: type, severity, description, structured evidence

 ---