Mirror/brk

mirror of https://github.com/bitcoinresearchkit/brk.git synced 2026-04-24 06:39:58 -07:00

Files

2026-02-14 13:14:05 +01:00

9.2 KiB

Raw Blame History

brk_oracle

Pure on-chain BTC/USD price oracle. No exchange feeds, no external APIs. Derives the bitcoin price from transaction data alone. Tracks block by block from height 575,000 (May 2019) onward with 0.1% median error.

Inspired by UTXOracle by @SteveSimple, which proved the concept. brk_oracle takes the same core insight and redesigns the algorithm for per-block resolution and rolling operation. See comparison below.

The signal

People buy bitcoin in round dollar amounts. Each purchase creates a transaction output whose satoshi value depends on the current price:

  $100 at  $50,000/BTC  →  200,000 sats
  $100 at $100,000/BTC  →  100,000 sats

Thousands of these round-dollar purchases happen every day: $10, $20, $50, $100, $200, $500. Plot every transaction output in a block on a log-scale histogram and clear spikes emerge at each round-dollar amount:

       $5  $10 $20 $50 $100 $200 $500  $1k       $5k $10k
        ↓   ↓   ↓   ↓    ↓    ↓    ↓    ↓         ↓   ↓
   │            ▌        ▌         ▌                  ▌
   │    ▌   █   █   ▌    █    ▌    █    █         ▌   █
   │   ▐█▌ ▐█▌  █▌  █   ▐█▌  ▐█   ▐█▌  ▐█▌  █    ▐█  ▐█▌
   │▄▄████▄███████▄▐█▌▄█████▄██▌▄█████▄███▄▐█▌▄▄▄███▄████▄▄
   └─────────────────────────────────────────────────────────→
                          log₁₀(satoshis)

On a log scale, when the price changes all spikes shift together by the same number of bins. A 2x price move always shifts the pattern by ~60 bins, whether bitcoin moves from $1k to $2k or from $50k to $100k:

  price × 2  →  sats ÷ 2  →  shift left by log₁₀(2) × 200 ≈ 60 bins

  $50k:  ···· █ ···· █ ···· █ ···· █ ····
  $100k: ·· █ ···· █ ···· █ ···· █ ······
              ◄── 60 bins ──►

The spacing between spikes is constant (set by the ratios between dollar amounts). Only the position changes. The oracle detects this pattern and reads the price from where it lands.

How it works

For each new block:

1. Filter outputs

Skip the coinbase transaction, then exclude noisy outputs: script types dominated by protocol activity (P2TR, P2WSH by default), dust below 1,000 sats, and round BTC amounts (0.01, 0.1, 1.0 BTC, etc.) that create false spikes unrelated to dollar purchases.

2. Map to log-scale bins

Each remaining output becomes a bin index in a 2,400-bin histogram:

  bin = round(log₁₀(sats) × 200)       200 bins per decade

3. Accumulate in ring buffer

A single block is too sparse for a clean signal. The histogram goes into a ring buffer (default depth: 12 blocks) so the pattern accumulates over recent blocks.

4. Compute EMA

The buffered histograms combine into an exponential moving average, weighting recent blocks more heavily:

  weight = α × (1 − α)^age             default α = 2/7 (~6-block span)

Fully recomputed from the ring buffer each block.

5. Score with a 19-point stencil

The core detection step. A stencil encodes where spikes from 19 round-dollar amounts ($1 through $10,000) should appear relative to each other on the log scale:

   $1       $5     $10          $50  $100  $200        $1k          $10k
    ↓        ↓      ↓            ↓     ↓     ↓          ↓             ↓
    ·────────·──────·────────────·─────·─────·──────────·─────────────·
  -400     -260   -200          -60    0    +60       +200          +400
                      bin offsets from the $100 reference point
                                 (19 offsets total)

The oracle slides this stencil across the EMA histogram within a narrow search window around the previous estimate. At each candidate position it reads the EMA value at all 19 expected spike locations, divides each by that offset's peak in the window (so rare amounts like $3 get equal voting weight to common amounts like $100) and sums the normalized values into a score.

6. Pick the best position

The position with the highest score is the new price estimate. Parabolic interpolation between the best bin and its neighbors refines it to sub-bin precision:

  price = 10^(10 − bin / 200)  dollars

The search window is bounded, so the oracle must track incrementally block by block from a known seed price.

Pipeline

  block ──→ filter ──→ histogram ──→ ring ──→ EMA ──→ stencil ──→ best bin ──→ $
             outputs     2,400 bins    buffer           19-point     parabolic
                          log-scale     ×12              scoring    interpolation

Input formats

The oracle accepts three input formats:

Raw block: process_block(&block) — filters and bins internally
Output pairs: process_outputs(iter) — (sats, output_type) pairs, still applies configured filters
Histogram: process_histogram(&hist) — pre-built [u32; 2400] array

The initial seed must be close to the real price at the starting height. The crate includes a PRICES constant with exchange prices for every height up to 630,000 to derive a seed from.

Configuration

All parameters via Config with sensible defaults:

Parameter	Default	Purpose
`alpha`	2/7	EMA decay rate (~6-block span)
`window_size`	12	Ring buffer depth in blocks
`search_below` / `search_above`	9 / 11	Search window around previous estimate (bins)
`min_sats`	1,000	Dust threshold
`exclude_common_round_values`	true	Filter d × 10ⁿ (d ∈ {1,2,3,5,6}) to prevent false stencil matches
`excluded_output_types`	P2TR, P2WSH	Script types dominated by protocol activity

Comparison with UTXOracle

UTXOracle by @SteveSimple proved that BTC/USD can be derived purely from on-chain data. Both projects share the same core insight (round-dollar detection via log-scale histogram) but make different engineering choices:

	brk_oracle	UTXOracle
Resolution	Per-block (~10 min) + daily candles	Per-run consensus price + per-output intraday scatter
Operation	Rolling: EMA over ring buffer, updates each block	Batch: processes a full day from scratch, stateless
Algorithm	Single-pass stencil scoring with per-offset normalization	Multi-step: dual stencil → rough estimate → output-to-USD mapping → iterative convergence
Stencil	19 round-USD offsets ($1 to $10k), each normalized to its own peak	803-point Gaussian + weighted spike template targeting 17 round-USD amounts
Round BTC handling	Excluded from histogram entirely	Histogram bins smoothed by averaging neighbors
Output filtering	Per-output: script type, dust threshold, round BTC	Per-tx: exactly 2 outputs, ≤5 inputs, no same-day inputs, ≤500-byte witness
Validated from	Height 575,000 (May 2019)	December 2023
Language	Rust	Python
Dependencies	None (pure computation, caller provides block data)	Bitcoin Core RPC
Bins per decade	200	200

Accuracy

Tested over 361,245 blocks (heights 575,000 to 936,244) against exchange OHLC data. Error is measured per block as distance from the oracle estimate to the exchange high/low range at that height. If the oracle falls within the range, the error is zero.

Per-block

Metric	Value
Median error	0.10%
95th percentile	0.55%
99th percentile	1.4%
99.9th percentile	4.4%
RMSE	0.38%
Max error	18.1%
Bias	+0.04 bins (essentially zero)
Blocks > 5% error	237 (0.07%)
Blocks > 10% error	22 (0.006%)
Blocks > 20% error	0

Daily candles

Oracle daily OHLC built from per-block prices vs exchange daily OHLC:

	Median	RMSE	Max
Open	0.20%	0.49%	5.9%
High	0.54%	0.87%	9.1%
Low	0.48%	1.31%	19.7%
Close	0.23%	0.58%	6.9%

By year

Year	Blocks	Median	RMSE	Max	>5%	>10%	Price range
2019	35,764	0.10%	0.61%	17.2%	103	16	$5,656–$13,868
2020	53,102	0.10%	0.48%	18.2%	85	15	$3,858–$29,322
2021	52,733	0.07%	0.47%	14.4%	38	9	$27,678–$69,000
2022	53,230	0.07%	0.32%	6.8%	10	0	$15,460–$48,240
2023	54,032	0.10%	0.25%	6.7%	5	0	$16,490–$44,700
2024	53,367	0.11%	0.31%	9.7%	16	0	$38,555–$108,298
2025	53,113	0.11%	0.25%	5.8%	4	0	$74,409–$126,198
2026	5,904	0.11%	0.27%	3.3%	0	0	$60,000–$97,900

Accuracy improves over time as on-chain transaction volume grows. Since 2022, zero blocks exceed 10% error. All worst-case errors occur during the fastest intraday price moves in 2019 to 2021.

9.2 KiB Raw Blame History Unescape Escape