Version 0.1.0 - Transaction Normalization Stage
A comprehensive cryptocurrency bookkeeping application that normalizes transaction data from exchanges and on-chain sources into a unified format using DuckDB. Currently focused on data normalization with plans for full bookkeeping features.
CryptoBookKeeper is designed to be a comprehensive cryptocurrency bookkeeping solution. Currently in its initial phase, it focuses on normalizing transaction data from exchanges and on-chain sources into a unified format using DuckDB.
Current Stage (v0.1.0): Data normalization and unified transaction model Future Plans: Full bookkeeping features, tax reporting, portfolio tracking, and more.
Raw Data Sources β Export Scripts β DuckDB β Unified Schema
β β β β
- Exchanges - CCXT API - Staging - transactions_unified
- On-chain - DeBank API - Parquet - Standardized format
- Scam Filter
# Clone and setup
git clone https://github.com/crazysoldier/CryptoBookKeeper.git
cd CryptoBookKeeper
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Copy and edit environment template
cp .env.template .env
# Edit .env with your API keys and settings
nano .env# Full refresh (first time or complete re-sync)
make all
# Incremental sync (recommended for daily updates - saves API costs!)
make sync
# Or run individual steps
make setup # Setup environment
make export-exchanges # Export exchange data
make export-debank # Export on-chain data (all transactions)
make export-debank-incremental # Export only new on-chain data
make stage # Stage data in DuckDBβββ project-docs/ # Documentation
β βββ overview.md
β βββ requirements.md
β βββ tech-specs.md
β βββ data-dictionary.md
βββ scripts/ # Export scripts
β βββ export_exchanges.py
β βββ export_onchain_eth.py
β βββ stage_duckdb.py
βββ sql/ # Database initialization
β βββ duckdb_init_simple.sql
βββ data/ # Data storage
β βββ raw/ # Raw CSV exports
β βββ curated/ # Processed Parquet files
βββ Makefile # Orchestration
# Time range for data collection
START_TS=2024-01-01T00:00:00Z
# Exchanges to export (comma-separated)
EXCHANGES=coinbase,binance,kraken
# Exchange API credentials
COINBASE_API_KEY=your_key_here
COINBASE_SECRET=your_secret_here
COINBASE_PASSPHRASE=your_passphrase_here
# DeBank Cloud API (for on-chain data)
DEBANK_API_KEY=your_debank_api_key
# Chains to monitor (comma-separated)
# See DEBANK_CHAINS.md for all 123 supported chains
DEBANK_CHAINS=eth,bsc,matic,arb,op,base,avax
# Scam token filtering (recommended: true)
# Filters out spam/airdrop tokens marked by DeBank
DEBANK_FILTER_SCAMS=true
# Ethereum addresses to track (comma-separated, lowercase)
EVM_ADDRESSES=0x1234...,0x5678...All transactions are normalized into a unified schema:
| Column | Type | Description |
|---|---|---|
domain |
VARCHAR | Data source (exchanges, onchain) |
source |
VARCHAR | Specific source (coinbase, ethereum) |
ts_utc |
TIMESTAMP | UTC timestamp |
txid |
VARCHAR | Transaction ID |
base |
VARCHAR | Base currency |
quote |
VARCHAR | Quote currency |
side |
VARCHAR | buy/sell/transfer |
amount |
DECIMAL | Transaction amount |
price |
DECIMAL | Price per unit |
fee_ccy |
VARCHAR | Fee currency |
fee_amt |
DECIMAL | Fee amount |
addr_from |
VARCHAR | From address |
addr_to |
VARCHAR | To address |
chain |
VARCHAR | Blockchain |
token_symbol |
VARCHAR | Token symbol |
token_decimal |
INTEGER | Token decimals |
raw_json |
JSON | Original raw data |
- Python 3.11+ - Core language
- CCXT - Exchange API integration
- DeBank Cloud API - On-chain data across 123 chains
- DuckDB - In-process OLAP database with upsert support
- Polars - Fast DataFrame processing
- Parquet - Columnar storage format
make setup # Setup virtual environment and dependencies
make export-exchanges # Export data from exchanges
make export-debank # Export on-chain data (full refresh)
make export-debank-incremental # Export only new on-chain data (saves API costs)
make stage # Stage data in DuckDB with upsert
make excel # Export transactions to formatted Excel workbook
make all # Full pipeline (full refresh)
make sync # Incremental pipeline (recommended for daily use)
make clean # Clean generated files
make test # Run testsThe pipeline includes comprehensive data quality checks:
- Scam Token Filtering: Automatically filters out spam/airdrop tokens marked by DeBank (60%+ of transactions on some chains!)
- Timestamp normalization to UTC
- Type casting and validation
- Duplicate detection via DuckDB upsert (INSERT OR REPLACE)
- Missing value handling
- Schema validation
- Incremental loading to save API costs and improve performance
- Rate Limits: Exchange APIs have rate limits
- Historical Data: Limited by exchange API availability
- DeBank API Costs: DeBank Cloud API charges per API unit (use incremental mode!)
- Data Volume: Large datasets require significant storage
- Scam Filtering: Only works for chains supported by DeBank (123 chains)
- Pandas Compatibility: May need
pip install pandas==2.3.3 numpy>=1.24.0,<2.0.0
See TROUBLESHOOTING.md for detailed solutions.
We use DeBank Cloud API for comprehensive on-chain data across multiple EVM chains.
Setup:
- Sign up at: https://cloud.debank.com/
- Purchase units (pay with USDC)
- Copy your Access Key
- Add to
.env:DEBANK_API_KEY=your_access_key - Add chains:
DEBANK_CHAINS=eth,matic,arb,op,base,avax - Enable scam filtering:
DEBANK_FILTER_SCAMS=true(recommended)
Scam Token Filtering:
CryptoBookKeeper automatically filters out spam and scam tokens using DeBank's is_scam flag:
# Enable scam filtering (default: true)
DEBANK_FILTER_SCAMS=true
# Disable to keep all transactions (not recommended)
DEBANK_FILTER_SCAMS=falseWhy filter scams?
- 60%+ of transactions on some chains (esp. Polygon) are spam tokens
- 100% of Polygon transactions in test data were scam airdrops
- Cleaner data = better insights and reports
- Saves time analyzing only legitimate transactions
Example Results:
Chain Total Fetched Scams Filtered Clean Data
---------------------------------------------------
ETH 20 1 (5%) 19
MATIC 20 20 (100%) 0
ARB 20 10 (50%) 10
OP 20 17 (85%) 3
BASE 20 13 (65%) 7
AVAX 1 0 (0%) 1
---------------------------------------------------
TOTAL 101 61 (60%) 40
Important:
- API endpoint:
https://pro-openapi.debank.com/v1 - Authentication: Use
AccessKeyheader (notBearer) - Rate limit: 100 requests/second
See DEBANK_SETUP.md for detailed setup instructions.
CryptoBookKeeper can export your unified transaction data to a beautifully formatted Excel workbook.
make excelThis creates a timestamped Excel file in data/exports/ with the following sheets:
- Key Metrics: Total transactions, exchange/on-chain breakdown, unique tokens, date range
- Breakdowns: Transactions by source, transactions by chain
- Clean, professional layout with formatted numbers
- Complete unified transaction view (all 118+ transactions)
- Columns: Date, Source, Type, Token, Amount, Price, Fee, Chain, From, To
- Features: Auto-filters, sortable columns, formatted dates and numbers
- Perfect for quick analysis and filtering
- All CEX trades, deposits, and withdrawals
- Columns: Date, Exchange, Type, Token, Amount, Price, Total Value, Fee
- Calculated total value (amount Γ price)
- Currency formatting for easy reading
- All blockchain transfers across multiple chains
- Columns: Date, Chain, Type, Token, Amount, From Address, To Address
- Grouped by chain for easy filtering
- Full address visibility for verification
- Year-by-year breakdown (2024, 2025, etc.)
- Metrics: Total transactions, exchange vs on-chain, unique tokens per year
- Perfect for annual reviews and tax preparation
- Auto-filters on all data sheets
- Professional formatting: Headers, currency, dates
- Sortable columns: Click any header to sort
- Timestamped filenames: Never overwrite your exports
- Lightweight: ~20KB for 100+ transactions
- π Portfolio analysis: See all your transactions in one place
- π° Tax preparation: Export and share with your accountant
- π Data validation: Review transactions in familiar Excel format
- π Reporting: Create custom charts and pivot tables
- ποΈ Archival: Keep offline backups of your transaction history
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is open source. See LICENSE file for details.
For issues and questions:
- Check the documentation in
project-docs/ - Review the Makefile for available commands
- Check logs in
logs/directory
- β Data normalization from exchanges and on-chain
- β Unified transaction schema
- β DuckDB pipeline with upsert support
- β Basic data quality checks
- π Portfolio tracking and analytics
- π Performance metrics and reporting
- π Automated data refresh
- π± Basic web interface
- π Tax reporting and compliance
- π¦ Multi-exchange portfolio aggregation
- π Advanced analytics and insights
- π Enhanced security features
- π― Complete bookkeeping solution
- π Professional reporting suite
- π Real-time data synchronization
- π± Mobile and web applications
Built with β€οΈ for the crypto community