Crypto Forem

Cover image for Building Infrastructure for Handling Millions of Bitcoin UTXOs at Scale
0xkniraj
0xkniraj

Posted on

Building Infrastructure for Handling Millions of Bitcoin UTXOs at Scale

The Challenge: Account Model vs UTXO Model

If you've ever designed a custodian solution for Ethereum, you might have found it surprisingly straightforward. Creating wallets from mnemonics, managing deposits and withdrawals, and even implementing multisig solutions using Safe (onchain) or MPC/TSS (offchain) are relatively simple tasks. This ease stems from Ethereum's account model, which treats balances as fungible assets in individual accounts.

Bitcoin, however, operates fundamentally differently due to its UTXO (Unspent Transaction Output) architecture. This creates unique challenges for custodians managing Bitcoin at scale.

Understanding the UTXO Model

In Bitcoin's UTXO model:

  • Every deposit requires a unique address (locking script)
  • Money behaves like physical currency notes rather than a simple account balance
  • UTXOs are atomic units - when you spend one, you must spend it entirely
  • Partial spending requires change - similar to receiving change when paying with cash

While Bitcoin is certainly fungible, the mechanics of how it works differ significantly from Ethereum. Think of UTXOs as individual checks: if you want to spend a check, you must spend it in full. To spend a partial amount, you write a new check back to yourself for the change.

This creates a unique challenge for custodians: efficiently managing potentially millions of "checks" of varying amounts.

The Scale Problem: Managing Millions of UTXOs

Imagine managing millions of UTXOs across thousands of user deposits, processing thousands of withdrawals daily. When a user requests a withdrawal of 0.1 BTC, how do you efficiently select which UTXOs to spend?

This isn't just a theoretical problem—it's the core challenge for exchanges, bridges, and payment processors handling Bitcoin at scale.

Why Standard Algorithms Fail at Scale

Approach 1: Branch and Bound (Bitcoin Core's Method)

Bitcoin Core uses the Branch and Bound algorithm for optimal UTXO selection:

def branch_and_bound(utxos, target_amount, fee_rate):
    # Recursively search for exact match combinations
    # Time complexity: O(2^n) with pruning
    # Works well for wallet-sized UTXO sets (hundreds to thousands)
    pass
Enter fullscreen mode Exit fullscreen mode

Why this fails at scale:

  • Bitcoin Core wallets: 100-1,000 UTXOs
  • Custodian platform: 10,000,000+ UTXOs
  • At millions of UTXOs: O(2^n) becomes computationally impossible
  • Even with aggressive pruning and timeouts, selection would take seconds or minutes
  • Would create a denial-of-service vulnerability under load

The fundamental insight: We need O(log n) or O(1) selection, not O(2^n).

Approach 2: The Greedy Algorithm

The simplest approach is to find the largest UTXO and use it:

withdrawal_amount = 0.1 BTC
selected_utxo = max(utxos)  // e.g., 0.5 BTC
change = 0.4 BTC
Enter fullscreen mode Exit fullscreen mode

Problems:

  • Creates severe UTXO fragmentation over time
  • Eventually, you'll have many small UTXOs
  • Large withdrawals become difficult as UTXOs get smaller

Approach 3: Binary Search on Sorted UTXOs

Sort UTXOs and use binary search to find an optimal match:

sorted_utxos = sort(utxos, ascending)
selected_utxo = binary_search(sorted_utxos, withdrawal_amount)
Enter fullscreen mode Exit fullscreen mode

Problems:

  • Works reasonably well for small withdrawals
  • Fails when the requested amount exceeds the largest UTXO
  • Example: User requests 5 BTC, but largest UTXO is 2 BTC

Approach 4: Consolidation (Combining Smallest UTXOs)

Combine multiple small UTXOs to fulfill the withdrawal:

selected_utxos = []
total = 0
for utxo in sorted(utxos, ascending):
    selected_utxos.append(utxo)
    total += utxo.amount
    if total >= withdrawal_amount:
        break
Enter fullscreen mode Exit fullscreen mode

Problems:

  • More inputs = larger transaction size = higher fees
  • Can become prohibitively expensive
  • Doesn't scale practically for regular operations

The Solution: bucket-Based Cascading Selection

None of the approaches above work efficiently at scale. The solution combines the cascading greedy approach with strategic consolidation, mimicking how physical currency denominations work.

How Physical Currency Works

When paying $50 in cash, you might use:

  • 5×$10 notes
  • 1×$50 note
  • 1×$100 note and receive $50 change

The choice depends on what denominations you have available. We can apply this same principle programmatically to Bitcoin UTXOs.

Implementation: Gaussian Distribution Buckets

Step 1: Define UTXO Tiers

TIER_HIERARCHY = [
    {"name": "dust",   "min": 0.0001,  "max": 0.001},   # 10k-100k sats
    {"name": "tiny",   "min": 0.001,   "max": 0.005},   # 100k-500k sats
    {"name": "small",  "min": 0.005,   "max": 0.01},    # 500k-1M sats
    {"name": "medium", "min": 0.01,    "max": 0.02},    # 1M-2M sats
    {"name": "large",  "min": 0.02,    "max": 0.05},     # 2M-5M sats
    {"name": "xlarge", "min": 0.05,     "max": 0.2},     # 5M-20M sats
    {"name": "whale",  "min": 0.2,     "max": None},    # 20M+ sats
]
Enter fullscreen mode Exit fullscreen mode

Tier ranges are not arbitrary - they're derived from:

  • Historical withdrawal distribution analysis
  • Fee optimization modeling
  • Consolidation opportunity windows
  • Minimum relay fee and dust thresholds

Why this works at scale:

  • Reduces search space from millions to constant T tiers
  • Each tier independently manageable
  • Enables efficient database indexing
  • Natural distribution boundaries

Step 2: Tier Cascading Strategy

Here's the critical difference from naive approaches—we don't just pick a tier, we **cascade through tiers intelligently

def find_utxos_for_withdrawal(amount):
    """
    Tier cascading: Start at preferred tier, cascade UP to larger tiers
    Key insight: Cascading UP prevents fragmentation
    """
    # Determine which tier this amount naturally belongs to
    preferred_tier = determine_tier(amount)

    # Start from preferred tier and go UP through larger tiers
    tier_index = TIER_HIERARCHY.index(preferred_tier)

    for i in range(tier_index, len(TIER_HIERARCHY)):
        current_tier = TIER_HIERARCHY[i]

        # Check if tier has sufficient total balance
        tier_balance = get_tier_balance(current_tier)

        if tier_balance >= amount:
            # Try to select UTXOs from this tier
            utxos = select_utxos_in_tier(current_tier, amount)

            if utxos:
                return utxos
            # If selection fails, continue to next tier

    raise InsufficientFundsError("No tier has sufficient balance")
Enter fullscreen mode Exit fullscreen mode

Why cascade UP, not DOWN:

Example: 0.03 BTC withdrawal

Cascading DOWN (naive approach):
medium (0.01-0.05) → small (0.005-0.01) → tiny → dust
Result: Many small UTXOs = fragmentation + high fees

Cascading UP (our approach):
medium (0.01-0.05) → large (0.05-0.1) → xlarge
Result: Fewer larger UTXOs = consolidation + low fees
Enter fullscreen mode Exit fullscreen mode

Step 3: UTXO Selection Within Tier

Once we've selected a tier, we need to pick specific UTXOs. This is where the knapsack problem comes in:

def select_utxos_in_tier(tier, amount, max_inputs=5):
    """
    Within a tier, select the smallest set of UTXOs that covers the amount.
    This is a bounded knapsack approximation.
    """
    # Get available UTXOs in tier, sorted by amount
    available_utxos = get_available_utxos(tier, sort_by="amount_asc", limit_by=max_inputs)

    # Greedy knapsack: Select smallest UTXOs first
    selected = []
    total = 0

    for utxo in available_utxos:    
        selected.append(utxo)
        total += utxo.amount

        if total >= amount:
            # Atomically reserve these UTXOs in database
            reserve_utxos(selected)
            return selected

    # Tier doesn't have enough UTXOs
    return None
Enter fullscreen mode Exit fullscreen mode

Why smallest-first selection?

  • Minimizes change output size
  • Naturally consolidates small UTXOs
  • Cleans up tier fragmentation over time

Note on implementation: In production, this is implemented entirely within PostgreSQL using:

  • SELECT FOR UPDATE SKIP LOCKED for concurrency-safe selection
  • Atomic transactions for reservation
  • Database-level constraints for consistency
  • Sub-millisecond selection performance with proper indexing

Step 4: Fee-Aware Tier Selection

At scale, transaction fees become a significant cost center. The tier selection should factor in current fee rates:

def find_utxos_with_fee_awareness(amount, fee_rate_sats_per_vb):
    """
    Adjust tier selection based on current fee market conditions
    """
    preferred_tier = determine_tier(amount)
    tier_index = TIER_HIERARCHY.index(preferred_tier)

    # During high-fee periods, start from a higher tier
    # This reduces input count = lower fees
    if fee_rate_sats_per_vb > 50:  # High fee period
        # Skip to next larger tier to minimize inputs
        tier_index = min(tier_index + 1, len(TIER_HIERARCHY) - 1)

    # Rest of cascading logic...
Enter fullscreen mode Exit fullscreen mode

Real-world impact:

Scenario: 0.03 BTC withdrawal during 100 sat/vB fee spike

Option A: 3 UTXOs from medium tier
Fee: 3 inputs × 68 vB × 100 sat/vB = 20,400 sats ($20.40)

Option B: 1 UTXO from large tier (fee-aware selection)
Fee: 1 input × 68 vB × 100 sat/vB = 6,800 sats ($6.80)

Savings: $13.60 per withdrawal during peak hours
Enter fullscreen mode Exit fullscreen mode

Why This Approach Works at Scale

Computational Complexity

Approach Time Complexity Scale Limit
Branch and Bound O(2^n) < 1,000 UTXOs
Binary Search O(log n) Fails on large amounts
Tier Cascading O(tiers) + O(k*log(k)) Millions of UTXOs

Where:

  • n = total UTXOs
  • tiers = number of tiers (constant: 7)
  • k = UTXOs selected per transaction (typically 2-3)

Our approach: O(7) + O(3*log(3)) = O(1) effectively

Natural Consolidation Mechanics

The smallest-first selection within tiers creates automatic consolidation:

Example:

Medium tier has 1000 UTXOs
Withdrawal: 0.03 BTC
Selection: Uses 3 smallest UTXOs (0.01, 0.011, 0.012)
Result: 1000 → 997 UTXOs, naturally cleans up small values
Enter fullscreen mode Exit fullscreen mode

This is self-healing - no manual consolidation jobs required during normal operations. Tiers gets replenished by new deposits.

Gaussian Distribution Alignment

Real-world withdrawal patterns follow a log-normal distribution (similar to Gaussian):

Distribution of withdrawals (from production data):
- 10th percentile: 0.002 BTC (tiny tier)
- 50th percentile: 0.02 BTC (medium tier)  ← most common
- 90th percentile: 0.15 BTC (xlarge tier)
- 99th percentile: 0.5 BTC (whale tier)
Enter fullscreen mode Exit fullscreen mode

By sizing tiers around this distribution, we optimize for the common case while handling edge cases efficiently.


Operational Considerations

Tier Health Monitoring

You need visibility into tier distribution to maintain system health:

def monitor_tier_health():
    """
    Track tier metrics for operational alerting
    """
    for tier in TIER_HIERARCHY:
        metrics = {
            "utxo_count": count_utxos(tier),
            "total_balance": sum_balance(tier),
            "avg_utxo_size": average_size(tier),
            "daily_utilization": withdrawals_from_tier(tier, days=1)
        }

        # Alert on depletion risk
        if metrics["utxo_count"] < 10:
            alert(f"Tier {tier['name']} depleting: {metrics['utxo_count']} UTXOs left")

        # Alert on imbalance
        runway_days = metrics["total_balance"] / metrics["daily_utilization"]
        if runway_days < 2:
            alert(f"Tier {tier['name']} has <2 days runway")
Enter fullscreen mode Exit fullscreen mode

Strategic Rebalancing

While the system is self-healing under normal operation, you need proactive rebalancing during quiet periods:

def rebalance_tiers_during_low_fees():
    """
    Run during low-fee periods (typically 2-6 AM UTC, weekends)
    Consolidate dust, split whales, rebalance distribution
    """
    current_fee_rate = get_current_fee_rate()

    # Only rebalance when fees are low
    if current_fee_rate > 10:  # sats/vB
        return

    # Strategy 1: Consolidate dust and tiny tiers
    dust_utxos = get_tier_utxos("dust")
    if len(dust_utxos) > 100:
        consolidate_into_tier(dust_utxos, target_tier="small")

    # Strategy 2: Ensure minimum UTXO count per tier
    for tier in TIER_HIERARCHY:
        if count_utxos(tier) < 20:
            create_utxos_for_tier(tier, target_count=50, source_tier="whale")
Enter fullscreen mode Exit fullscreen mode

When to rebalance:

  • Daily during off-peak hours
  • After major withdrawal spikes
  • When tier depletion alerts trigger
  • During extended low-fee periods (accumulate savings)

Database Implementation Notes

The algorithm is implemented entirely in PostgreSQL for:

  • Atomicity: BEGIN/COMMIT/ROLLBACK blocks ensure consistency
  • Concurrency: SELECT FOR UPDATE SKIP LOCKED handles concurrent withdrawals
  • Performance: Indexed queries on (tier, state, amount) enable sub-millisecond selection
  • Reliability: ACID guarantees prevent double-spending

This enables the database to instantly find the smallest available UTXOs in any tier.


Real-World Performance

In production environments processing thousands of withdrawals daily:

  • Average inputs per transaction: 2.1 UTXOs (down from 4.5 with naive approaches)
  • Fee reduction: 55% vs. simple consolidation approach
  • UTXO selection latency: <5ms (database-level selection)
  • Tier cascade events: <2% of withdrawals (most succeed on preferred tier)
  • Natural consolidation rate: thousands of UTXOs/day consolidated without manual intervention
  • System uptime during rebalancing: 100% (non-blocking background process)

Cost Impact at Scale:

Platform: 10,000 withdrawals/day
Average fee savings: $2.50/withdrawal
Monthly savings: $750,000
Annual savings: $9,000,000
Enter fullscreen mode Exit fullscreen mode

Limitations and Trade-offs

What This Approach Sacrifices

1. Theoretical Optimality

  • Branch and Bound finds the perfect UTXO combination
  • Tier cascading finds a good enough combination
  • Trade-off: ~5% more change outputs for 10,000x faster selection

2. Privacy Considerations

  • Selecting multiple UTXOs from same tier can link addresses
  • Common input ownership heuristic applies
  • Mitigation: Use mixing services or CoinJoin for privacy-critical applications

3. Minimum Withdrawal Threshold

  • Dust tier prevents withdrawals < 50,000 sats (~$50)
  • Necessary to avoid economically unspendable UTXOs
  • Alternative: Lightning Network for micro-withdrawals

When This Approach May Not Fit

Not suitable for:

  • Personal wallets (use Bitcoin Core's Branch and Bound)
  • Privacy-focused applications (use Wasabi/Samourai algorithms)
  • < 10,000 UTXOs (simpler approaches work fine)

Ideal for:

  • Exchanges and payment processors
  • Cross-chain bridges
  • Custodial services at scale
  • Any platform managing 1M+ UTXOs

Conclusion

Building a Bitcoin custodian at scale requires fundamentally different thinking than Ethereum-based solutions. The UTXO model, while more complex, can be efficiently managed using tier-based cascading strategies.

Key takeaways:

  1. Standard algorithms don't scale - Branch and Bound is O(2^n), impossible at millions of UTXOs
  2. Tier cascading UP prevents fragmentation while maintaining availability
  3. Fee-aware selection can save millions annually at scale
  4. Natural consolidation keeps the system healthy through smallest-first selection
  5. Database-level implementation provides atomicity and concurrency control
  6. Operational monitoring is critical - track tier health and rebalance proactively

By treating Bitcoin UTXOs like physical currency denominations and selecting them through intelligent tier traversal, custodians can process millions of transactions efficiently while minimizing fees and maintaining system health.

The real innovation isn't just the algorithm—it's designing a system that scales to millions of UTXOs while remaining operationally manageable.

Top comments (0)