mirror of
https://github.com/bitcoinresearchkit/brk.git
synced 2026-04-27 08:09:58 -07:00
75 lines
2.9 KiB
Markdown
75 lines
2.9 KiB
Markdown
# brk_indexer
|
|
|
|
Full Bitcoin blockchain indexer for fast analytics queries.
|
|
|
|
## What It Enables
|
|
|
|
Transform raw Bitcoin blockchain data into indexed vectors and key-value stores optimized for analytics. Query any block, transaction, address, or UTXO without scanning the chain.
|
|
|
|
## Key Features
|
|
|
|
- **Multi-phase block processing**: Parallel TXID computation, input/output processing, sequential finalization
|
|
- **Address indexing**: Maps addresses to their transaction history and UTXOs per address type
|
|
- **UTXO tracking**: Live outpoint→value lookups, address→unspent outputs
|
|
- **Reorg handling**: Automatic rollback to valid chain state on reorganization
|
|
- **Collision detection**: Validates rapidhash-based prefix lookups against known duplicate TXIDs
|
|
- **Incremental snapshots**: Periodic checkpoints for crash recovery
|
|
|
|
## Core API
|
|
|
|
```rust,ignore
|
|
let mut indexer = Indexer::forced_import(&outputs_dir)?;
|
|
|
|
// Index new blocks
|
|
let starting_indexes = indexer.index(&blocks, &client, &exit)?;
|
|
|
|
// Access indexed data
|
|
let txindex = indexer.stores.txidprefix_to_txindex.get(&txid_prefix)?;
|
|
let blockhash = indexer.vecs.blocks.blockhash.get(height)?;
|
|
```
|
|
|
|
## Data Structures
|
|
|
|
**Vecs** (append-only vectors):
|
|
- `blocks`: `blockhash`, `timestamp`, `difficulty`, `total_size`, `weight`
|
|
- `transactions`: `txid`, `first_txinindex`, `first_txoutindex`
|
|
- `inputs`: `outpoint`, `txindex`
|
|
- `outputs`: `value`, `outputtype`, `typeindex`, `txindex`
|
|
- `addresses`: Per-type `p2pkhbytes`, `p2shbytes`, `p2wpkhbytes`, etc.
|
|
|
|
**Stores** (key-value lookups):
|
|
- `txidprefix_to_txindex` - TXID lookup via 10-byte prefix
|
|
- `blockhashprefix_to_height` - Block lookup via 4-byte prefix
|
|
- `addresstype_to_addresshash_to_addressindex` - Address lookup per type
|
|
- `addresstype_to_addressindex_and_unspentoutpoint` - Live UTXO set per address
|
|
|
|
## Processing Pipeline
|
|
|
|
1. **Block metadata**: Store blockhash, difficulty, timestamp
|
|
2. **Compute TXIDs**: Parallel SHA256d across transactions
|
|
3. **Process inputs**: Lookup spent outpoints, resolve address info
|
|
4. **Process outputs**: Extract addresses, assign type indexes
|
|
5. **Finalize**: Sequential store updates, UTXO set mutations
|
|
6. **Commit**: Periodic flush to disk
|
|
|
|
## Performance
|
|
|
|
| Machine | Time | Disk | Peak Disk | Memory | Peak Memory |
|
|
|---------|------|------|-----------|--------|-------------|
|
|
| MBP M3 Pro (36GB, internal SSD) | 3h | 247 GB | 314 GB | 5.2 GB | 11 GB |
|
|
| Mac Mini M4 (16GB, external SSD) | 4.9h | 233 GB | 303 GB | 5.4 GB | 11 GB |
|
|
|
|
Full benchmark data: [bitcoinresearchkit/benches](https://github.com/bitcoinresearchkit/benches/tree/main/brk_indexer)
|
|
|
|
## Recommended: mimalloc v3
|
|
|
|
Use [mimalloc v3](https://crates.io/crates/mimalloc) as the global allocator to reduce memory usage.
|
|
|
|
## Built On
|
|
|
|
- `vecdb` for append-only vectors
|
|
- `brk_cohort` for address type handling
|
|
- `brk_iterator` for block iteration
|
|
- `brk_store` for key-value storage
|
|
- `brk_types` for domain types
|