Files
brk/crates/brk_indexer/README.md

278 lines
9.5 KiB
Markdown

# brk_indexer
High-performance Bitcoin blockchain indexer with parallel processing and dual storage architecture.
[![Crates.io](https://img.shields.io/crates/v/brk_indexer.svg)](https://crates.io/crates/brk_indexer)
[![Documentation](https://docs.rs/brk_indexer/badge.svg)](https://docs.rs/brk_indexer)
## Overview
This crate provides a comprehensive Bitcoin blockchain indexer built on top of `brk_parser`. It processes raw Bitcoin blocks in parallel, extracting and indexing transactions, addresses, inputs, outputs, and metadata into optimized storage structures. The indexer maintains two complementary storage systems: columnar vectors for analytics and key-value stores for fast lookups.
**Key Features:**
- Parallel block processing with multi-threaded transaction analysis
- Dual storage architecture: columnar vectors + key-value stores
- Address type classification and indexing for all Bitcoin script types
- Collision detection and validation for address hashes and transaction IDs
- Incremental processing with automatic rollback and recovery
- Height-based synchronization with Bitcoin Core RPC validation
- Optimized batch operations with configurable snapshot intervals
**Target Use Cases:**
- Bitcoin blockchain analysis requiring full transaction history
- Address clustering and UTXO set analysis
- Blockchain explorers needing fast address/transaction lookups
- Research applications requiring structured access to blockchain data
## Installation
```bash
cargo add brk_indexer
```
## Quick Start
```rust
use brk_indexer::Indexer;
use brk_parser::Parser;
use bitcoincore_rpc::{Client, Auth};
use vecdb::Exit;
use std::path::Path;
// Initialize Bitcoin Core RPC client
let rpc = Client::new("http://localhost:8332", Auth::None)?;
let rpc = Box::leak(Box::new(rpc));
// Create parser for raw block data
let blocks_dir = Path::new("/path/to/bitcoin/blocks");
let parser = Parser::new(blocks_dir, None, rpc);
// Initialize indexer with output directory
let outputs_dir = Path::new("./indexed_data");
let mut indexer = Indexer::forced_import(outputs_dir)?;
// Index blockchain data
let exit = Exit::default();
let starting_indexes = indexer.index(&parser, rpc, &exit, true)?;
println!("Indexed up to height: {}", starting_indexes.height);
```
## API Overview
### Core Types
- **`Indexer`**: Main coordinator managing vectors and stores
- **`Vecs`**: Columnar storage for blockchain data analytics
- **`Stores`**: Key-value storage for fast hash-based lookups
- **`Indexes`**: Current indexing state tracking progress across data types
### Key Methods
**`Indexer::forced_import(outputs_dir: &Path) -> Result<Self>`**
Creates or opens indexer instance with automatic version management.
**`index(&mut self, parser: &Parser, rpc: &'static Client, exit: &Exit, check_collisions: bool) -> Result<Indexes>`**
Main indexing function processing blocks from parser with collision detection.
### Storage Architecture
**Columnar Vectors (Vecs):**
- `height_to_*`: Block-level data (hash, timestamp, difficulty, size, weight)
- `txindex_to_*`: Transaction data (ID, version, locktime, size, RBF flag)
- `outputindex_to_*`: Output data (value, type, address mapping)
- `inputindex_to_outputindex`: Input-to-output relationship mapping
**Key-Value Stores:**
- `addressbyteshash_to_typeindex`: Address hash to internal index mapping
- `blockhashprefix_to_height`: Block hash prefix to height lookup
- `txidprefix_to_txindex`: Transaction ID prefix to internal index
- `addresstype_to_typeindex_with_outputindex`: Address type to output mappings
### Address Type Support
Complete coverage of Bitcoin script types:
- **P2PK**: Pay-to-Public-Key (33-byte and 65-byte variants)
- **P2PKH**: Pay-to-Public-Key-Hash
- **P2SH**: Pay-to-Script-Hash
- **P2WPKH**: Pay-to-Witness-Public-Key-Hash
- **P2WSH**: Pay-to-Witness-Script-Hash
- **P2TR**: Pay-to-Taproot
- **P2MS**: Pay-to-Multisig
- **P2A**: Pay-to-Address (custom type)
- **OpReturn**: OP_RETURN data outputs
- **Empty/Unknown**: Non-standard script types
## Examples
### Basic Indexing Operation
```rust
use brk_indexer::Indexer;
use brk_parser::Parser;
use std::path::Path;
// Initialize components
let outputs_dir = Path::new("./blockchain_index");
let mut indexer = Indexer::forced_import(outputs_dir)?;
let blocks_dir = Path::new("/Users/satoshi/.bitcoin/blocks");
let parser = Parser::new(blocks_dir, None, rpc);
// Index with collision checking enabled
let exit = vecdb::Exit::default();
let final_indexes = indexer.index(&parser, rpc, &exit, true)?;
println!("Final height: {}", final_indexes.height);
println!("Total transactions: {}", final_indexes.txindex);
println!("Total addresses: {}", final_indexes.total_address_count());
```
### Querying Indexed Data
```rust
use brk_indexer::Indexer;
use brk_structs::{Height, TxidPrefix, AddressBytesHash};
let indexer = Indexer::forced_import("./blockchain_index")?;
// Look up block hash by height
let height = Height::new(750000);
if let Some(block_hash) = indexer.vecs.height_to_blockhash.get(height)? {
println!("Block 750000 hash: {}", block_hash);
}
// Look up transaction by ID prefix
let txid_prefix = TxidPrefix::from_str("abcdef123456")?;
if let Some(tx_index) = indexer.stores.txidprefix_to_txindex.get(&txid_prefix)? {
println!("Transaction index: {}", tx_index);
}
// Query address information
let address_hash = AddressBytesHash::from(/* address bytes */);
if let Some(type_index) = indexer.stores.addressbyteshash_to_typeindex.get(&address_hash)? {
println!("Address type index: {}", type_index);
}
```
### Incremental Processing
```rust
use brk_indexer::Indexer;
// Indexer automatically resumes from last processed height
let mut indexer = Indexer::forced_import("./blockchain_index")?;
let current_indexes = indexer.vecs.current_indexes(&indexer.stores, rpc)?;
println!("Resuming from height: {}", current_indexes.height);
// Process new blocks incrementally
let exit = vecdb::Exit::default();
let updated_indexes = indexer.index(&parser, rpc, &exit, true)?;
println!("Processed {} new blocks",
updated_indexes.height.as_u32() - current_indexes.height.as_u32());
```
### Address Type Analysis
```rust
use brk_indexer::Indexer;
use brk_structs::OutputType;
let indexer = Indexer::forced_import("./blockchain_index")?;
// Analyze address distribution by type
for output_type in OutputType::as_vec() {
let count = indexer.vecs.outputindex_to_outputtype
.iter()
.filter(|&ot| ot == output_type)
.count();
println!("{:?}: {} outputs", output_type, count);
}
// Query specific address type data
let p2pkh_store = &indexer.stores.addresstype_to_typeindex_with_outputindex
.p2pkh;
println!("P2PKH addresses: {}", p2pkh_store.len());
```
## Architecture
### Parallel Processing
The indexer uses sophisticated parallel processing:
- **Block-Level Parallelism**: Concurrent processing of transactions within blocks
- **Transaction Analysis**: Parallel input/output processing with `rayon`
- **Address Resolution**: Multi-threaded address type classification and indexing
- **Collision Detection**: Parallel validation of hash collisions across address types
### Storage Optimization
**Columnar Storage (vecdb):**
- Compressed vectors for space-efficient analytics queries
- Raw vectors for frequently accessed data (heights, hashes)
- Page-aligned storage for memory mapping efficiency
**Key-Value Storage (Fjall):**
- LSM-tree architecture for write-heavy indexing workloads
- Bloom filters for fast negative lookups
- Transactional consistency with rollback support
### Memory Management
- **Batch Processing**: 1000-block snapshots to balance memory and I/O
- **Reader Management**: Static readers for consistent data access during processing
- **Collision Tracking**: BTreeMap-based collision detection with memory cleanup
- **Exit Handling**: Graceful shutdown with consistent state preservation
### Version Management
- **Schema Versioning**: Automatic migration on version changes (currently v21)
- **Rollback Support**: Automatic recovery from incomplete processing
- **State Tracking**: Height-based synchronization across all storage components
## Performance Characteristics
### Processing Speed
- **Parallel Transaction Processing**: Multi-core utilization for CPU-intensive operations
- **Optimized I/O**: Batch operations reduce disk overhead
- **Memory Efficiency**: Streaming processing without loading entire blockchain
### Storage Requirements
- **Columnar Compression**: Significant space savings for repetitive blockchain data
- **Index Optimization**: Bloom filters reduce lookup overhead
- **Incremental Growth**: Storage scales linearly with blockchain size
### Scalability
- **Height-Based Partitioning**: Enables distributed processing strategies
- **Modular Architecture**: Separate vector and store systems for flexible deployment
- **Resource Configuration**: Configurable batch sizes and memory limits
## Code Analysis Summary
**Main Structure**: `Indexer` coordinating `Vecs` (columnar analytics) and `Stores` (key-value lookups) \
**Processing Pipeline**: Multi-threaded block analysis with parallel transaction/address processing \
**Storage Architecture**: Dual system using vecdb for analytics and Fjall for lookups \
**Address Indexing**: Complete Bitcoin script type coverage with collision detection \
**Synchronization**: Height-based coordination with Bitcoin Core RPC validation \
**Parallel Processing**: rayon-based parallelism for transaction analysis and address resolution \
**Architecture**: High-performance blockchain indexer with ACID guarantees and incremental processing
---
_This README was generated by Claude Code_