Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 13, 2025

📄 18% (0.18x) speedup for BCDataStream.write_uint64 in electrum/transaction.py

⏱️ Runtime : 1.32 milliseconds 1.11 milliseconds (best of 250 runs)

📝 Explanation and details

The optimization achieves an 18% speedup by inlining the write_uint64 method to eliminate function call overhead.

Key changes:

  • Inlined implementation: The write_uint64 method now directly contains the struct packing logic instead of calling self._write_num('<Q', val)
  • Eliminated method call overhead: Removes the indirect function call, parameter passing, and stack frame creation for each write_uint64 invocation
  • Preserved _write_num: Keeps the original method intact for backward compatibility with other potential callers

Why this is faster:
In Python, function calls have significant overhead due to argument binding, stack frame creation, and method resolution. Since write_uint64 is a hot-path method that's called frequently (4,066 times in the profiler data), eliminating the indirection to _write_num provides measurable performance gains. The line profiler shows the function call itself took 100% of the time in the original version, which is completely eliminated in the optimized version.

Performance characteristics:

  • Best for frequent writes: Test results show 17-38% improvements across various scenarios, with the largest gains on simple cases like writing zero (30.6% faster)
  • Scales well: Large-scale tests with 1000 writes maintain ~18% improvement, indicating consistent performance gains
  • No regression on edge cases: Error handling and multiple write scenarios show similar improvements (20-27% faster)

This optimization is particularly effective for Bitcoin transaction processing where uint64 values (timestamps, amounts, etc.) are frequently serialized.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 4117 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import struct

# imports
import pytest  # used for our unit tests
from electrum.transaction import BCDataStream

# unit tests

# --- Basic Test Cases ---

def test_write_uint64_basic_zero():
    # Test writing 0
    ds = BCDataStream()
    ds.write_uint64(0) # 1.52μs -> 1.17μs (30.6% faster)

def test_write_uint64_basic_small_number():
    # Test writing a small number
    ds = BCDataStream()
    ds.write_uint64(42) # 1.39μs -> 1.19μs (17.2% faster)

def test_write_uint64_basic_max_32bit():
    # Test writing maximum 32-bit unsigned int
    ds = BCDataStream()
    ds.write_uint64(0xFFFFFFFF) # 1.51μs -> 1.24μs (21.3% faster)

def test_write_uint64_basic_typical_64bit():
    # Test writing a typical 64-bit value
    ds = BCDataStream()
    ds.write_uint64(0x123456789ABCDEF0) # 1.45μs -> 1.11μs (30.6% faster)

# --- Edge Test Cases ---

def test_write_uint64_edge_max_uint64():
    # Test writing maximum uint64 value
    max_uint64 = 0xFFFFFFFFFFFFFFFF
    ds = BCDataStream()
    ds.write_uint64(max_uint64) # 1.46μs -> 1.24μs (18.4% faster)

def test_write_uint64_edge_min_uint64():
    # Test writing minimum uint64 value (0)
    ds = BCDataStream()
    ds.write_uint64(0) # 1.38μs -> 1.08μs (27.4% faster)

def test_write_uint64_edge_negative_value():
    # Test writing a negative value should raise struct.error
    ds = BCDataStream()
    with pytest.raises(struct.error):
        ds.write_uint64(-1) # 1.74μs -> 1.45μs (20.1% faster)

def test_write_uint64_edge_overflow_value():
    # Test writing a value greater than max uint64 should raise struct.error
    ds = BCDataStream()
    with pytest.raises(struct.error):
        ds.write_uint64(0x1FFFFFFFFFFFFFFFF) # 1.75μs -> 1.46μs (20.1% faster)

def test_write_uint64_edge_non_integer():
    # Test writing a non-integer value should raise struct.error
    ds = BCDataStream()
    with pytest.raises(struct.error):
        ds.write_uint64(3.14159) # 1.55μs -> 1.12μs (37.8% faster)

def test_write_uint64_edge_string_input():
    # Test writing a string value should raise struct.error
    ds = BCDataStream()
    with pytest.raises(struct.error):
        ds.write_uint64("100") # 1.43μs -> 1.08μs (33.0% faster)

def test_write_uint64_edge_none_input():
    # Test writing None should raise struct.error
    ds = BCDataStream()
    with pytest.raises(struct.error):
        ds.write_uint64(None) # 1.29μs -> 1.07μs (21.0% faster)

def test_write_uint64_edge_multiple_writes():
    # Test writing multiple values appends correctly
    ds = BCDataStream()
    ds.write_uint64(1) # 1.66μs -> 1.35μs (22.9% faster)
    ds.write_uint64(2) # 823ns -> 650ns (26.6% faster)
    ds.write_uint64(3) # 390ns -> 329ns (18.5% faster)
    expected = struct.pack('<Q', 1) + struct.pack('<Q', 2) + struct.pack('<Q', 3)

def test_write_uint64_edge_write_after_init_with_input():
    # Test writing after manually setting input
    ds = BCDataStream()
    ds.input = bytearray(b'abc')
    ds.write_uint64(7) # 1.23μs -> 911ns (34.5% faster)
    expected = b'abc' + struct.pack('<Q', 7)

def test_write_uint64_edge_write_zero_and_max():
    # Test writing 0 and then max uint64
    ds = BCDataStream()
    ds.write_uint64(0) # 1.41μs -> 1.07μs (31.4% faster)
    ds.write_uint64(0xFFFFFFFFFFFFFFFF) # 889ns -> 665ns (33.7% faster)
    expected = struct.pack('<Q', 0) + struct.pack('<Q', 0xFFFFFFFFFFFFFFFF)

# --- Large Scale Test Cases ---

def test_write_uint64_large_scale_many_writes():
    # Test writing 1000 uint64 values in sequence
    ds = BCDataStream()
    for i in range(1000):
        ds.write_uint64(i) # 304μs -> 257μs (18.1% faster)

def test_write_uint64_large_scale_pattern():
    # Test writing a repeating pattern
    ds = BCDataStream()
    pattern = [0, 0xFFFFFFFFFFFFFFFF]
    for i in range(500):
        ds.write_uint64(pattern[i % 2]) # 160μs -> 135μs (18.6% faster)

def test_write_uint64_large_scale_random_values():
    # Test writing a set of random uint64 values
    import random
    ds = BCDataStream()
    random.seed(12345)  # Deterministic
    values = [random.randint(0, 0xFFFFFFFFFFFFFFFF) for _ in range(1000)]
    for v in values:
        ds.write_uint64(v) # 320μs -> 271μs (18.1% faster)

def test_write_uint64_large_scale_all_bytes():
    # Test writing values that exercise every byte value in each position
    ds = BCDataStream()
    # Use values that have a single byte set in each position
    for i in range(8):
        v = 1 << (i * 8)
        ds.write_uint64(v) # 5.34μs -> 4.49μs (18.9% faster)
    # Check each 8-byte chunk
    for i in range(8):
        expected = struct.pack('<Q', 1 << (i * 8))

def test_write_uint64_large_scale_extend_existing_input():
    # Test writing to a stream with a large existing input
    ds = BCDataStream()
    ds.input = bytearray(b'x' * 8000)
    ds.write_uint64(123456789) # 1.29μs -> 1.01μs (27.0% faster)

# --- Additional Edge Cases ---

def test_write_uint64_edge_float_int_equivalence():
    # Test that float values which are integer-valued still fail
    ds = BCDataStream()
    with pytest.raises(struct.error):
        ds.write_uint64(10.0) # 1.42μs -> 1.16μs (21.8% faster)

def test_write_uint64_edge_bool_input():
    # Test that bools are accepted as ints (since struct.pack allows this)
    ds = BCDataStream()
    ds.write_uint64(True) # 1.59μs -> 1.20μs (32.8% faster)
    ds = BCDataStream()
    ds.write_uint64(False) # 605ns -> 462ns (31.0% faster)

def test_write_uint64_edge_large_object_input():
    # Test that writing a very large integer fails
    ds = BCDataStream()
    with pytest.raises(struct.error):
        ds.write_uint64(10**100) # 1.72μs -> 1.38μs (24.7% faster)

def test_write_uint64_edge_write_after_none_input():
    # Test writing after input is set to None again
    ds = BCDataStream()
    ds.write_uint64(5) # 1.41μs -> 1.17μs (20.6% faster)
    ds.input = None
    ds.write_uint64(6) # 538ns -> 422ns (27.5% faster)

def test_write_uint64_edge_write_multiple_types():
    # Test writing bool, int, and large int in sequence
    ds = BCDataStream()
    ds.write_uint64(True) # 1.23μs -> 1.01μs (21.6% faster)
    ds.write_uint64(123) # 734ns -> 597ns (22.9% faster)
    ds.write_uint64(0xFFFFFFFFFFFFFFFF) # 574ns -> 474ns (21.1% faster)
    expected = struct.pack('<Q', 1) + struct.pack('<Q', 123) + struct.pack('<Q', 0xFFFFFFFFFFFFFFFF)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import struct

# imports
import pytest
from electrum.transaction import BCDataStream

# unit tests

# -------------------------
# Basic Test Cases
# -------------------------

def test_write_uint64_basic_small_values():
    # Test writing 0
    ds = BCDataStream()
    ds.write_uint64(0) # 1.73μs -> 1.30μs (33.1% faster)

    # Test writing 1
    ds = BCDataStream()
    ds.write_uint64(1) # 514ns -> 463ns (11.0% faster)

    # Test writing 42
    ds = BCDataStream()
    ds.write_uint64(42) # 362ns -> 322ns (12.4% faster)

    # Test writing 255
    ds = BCDataStream()
    ds.write_uint64(255) # 346ns -> 292ns (18.5% faster)

def test_write_uint64_basic_large_values():
    # Test writing a mid-range value
    ds = BCDataStream()
    ds.write_uint64(0x1234567890abcdef) # 1.32μs -> 1.03μs (27.8% faster)

    # Test writing the maximum uint64 value
    ds = BCDataStream()
    ds.write_uint64(0xffffffffffffffff) # 617ns -> 536ns (15.1% faster)

# -------------------------
# Edge Test Cases
# -------------------------

def test_write_uint64_min_value():
    # Test writing the minimum value (0)
    ds = BCDataStream()
    ds.write_uint64(0) # 1.29μs -> 991ns (29.8% faster)

def test_write_uint64_max_value():
    # Test writing the maximum possible uint64 value
    ds = BCDataStream()
    ds.write_uint64(2**64 - 1) # 1.28μs -> 1.15μs (11.6% faster)

def test_write_uint64_overflow():
    # Test writing a value that is too large for uint64
    ds = BCDataStream()
    with pytest.raises(struct.error):
        ds.write_uint64(2**64) # 1.74μs -> 1.44μs (20.9% faster)

def test_write_uint64_negative():
    # Test writing a negative value (should raise struct.error)
    ds = BCDataStream()
    with pytest.raises(struct.error):
        ds.write_uint64(-1) # 1.67μs -> 1.32μs (27.0% faster)

def test_write_uint64_non_integer():
    # Test writing a float (should raise struct.error)
    ds = BCDataStream()
    with pytest.raises(struct.error):
        ds.write_uint64(1.23) # 1.42μs -> 1.16μs (23.0% faster)

    # Test writing a string (should raise struct.error or TypeError)
    ds = BCDataStream()
    with pytest.raises((struct.error, TypeError)):
        ds.write_uint64("100") # 631ns -> 488ns (29.3% faster)

def test_write_uint64_multiple_writes():
    # Test writing multiple uint64 values in sequence
    ds = BCDataStream()
    ds.write_uint64(1) # 1.56μs -> 1.28μs (21.9% faster)
    ds.write_uint64(2) # 757ns -> 665ns (13.8% faster)
    ds.write_uint64(3) # 394ns -> 313ns (25.9% faster)
    # Should be the concatenation of each value's encoding
    expected = struct.pack('<Q', 1) + struct.pack('<Q', 2) + struct.pack('<Q', 3)

def test_write_uint64_preserves_existing_data():
    # Test that writing to an existing stream appends, not overwrites
    ds = BCDataStream()
    ds.write_uint64(10) # 1.33μs -> 1.03μs (29.3% faster)
    first = ds.input[:]
    ds.write_uint64(20) # 643ns -> 508ns (26.6% faster)

def test_write_uint64_input_is_bytearray():
    # Test that input is a bytearray after writing
    ds = BCDataStream()
    ds.write_uint64(123) # 1.19μs -> 1.01μs (18.1% faster)

def test_write_uint64_input_none_then_extend():
    # Test that input is initialized on first write, then extended on subsequent writes
    ds = BCDataStream()
    ds.write_uint64(5) # 1.29μs -> 1.02μs (26.3% faster)
    ds.write_uint64(6) # 672ns -> 547ns (22.9% faster)

# -------------------------
# Large Scale Test Cases
# -------------------------

def test_write_uint64_large_sequence():
    # Test writing a large number of uint64 values (performance and correctness)
    ds = BCDataStream()
    N = 1000  # Keep under 1000 for performance
    for i in range(N):
        ds.write_uint64(i) # 300μs -> 256μs (17.1% faster)
    # Check a few spot values for correctness
    for idx in [0, 1, 10, 100, 999]:
        start = idx * 8
        end = start + 8
        expected = struct.pack('<Q', idx)

def test_write_uint64_large_values():
    # Test writing a sequence of large uint64 values
    ds = BCDataStream()
    values = [2**63, 2**63+1, 2**63+2, 2**64-1]
    for v in values:
        ds.write_uint64(v) # 3.77μs -> 3.00μs (25.6% faster)
    expected = b''.join(struct.pack('<Q', v) for v in values)

def test_write_uint64_performance_under_load():
    # This test is to ensure the function does not degrade with many writes
    ds = BCDataStream()
    N = 500  # Reasonable size for unit test
    for i in range(N):
        ds.write_uint64(0xffffffffffffffff - i) # 156μs -> 133μs (17.3% faster)
    # Check a few values
    for idx in [0, 10, 100, 499]:
        val = 0xffffffffffffffff - idx
        start = idx * 8
        end = start + 8

# -------------------------
# Miscellaneous/Mutation-Resistant
# -------------------------

@pytest.mark.parametrize("val,expected", [
    (0, b'\x00\x00\x00\x00\x00\x00\x00\x00'),
    (1, b'\x01\x00\x00\x00\x00\x00\x00\x00'),
    (256, b'\x00\x01\x00\x00\x00\x00\x00\x00'),
    (65535, b'\xff\xff\x00\x00\x00\x00\x00\x00'),
    (4294967295, b'\xff\xff\xff\xff\x00\x00\x00\x00'),
    (2**63, struct.pack('<Q', 2**63)),
    (2**64-1, b'\xff\xff\xff\xff\xff\xff\xff\xff'),
])
def test_write_uint64_parametrized(val, expected):
    # Parametrized test for a variety of values
    ds = BCDataStream()
    ds.write_uint64(val) # 11.0μs -> 8.86μs (23.9% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-BCDataStream.write_uint64-mhxr547l and push.

Codeflash Static Badge

The optimization achieves an **18% speedup** by **inlining the `write_uint64` method** to eliminate function call overhead. 

**Key changes:**
- **Inlined implementation**: The `write_uint64` method now directly contains the struct packing logic instead of calling `self._write_num('<Q', val)`
- **Eliminated method call overhead**: Removes the indirect function call, parameter passing, and stack frame creation for each `write_uint64` invocation
- **Preserved `_write_num`**: Keeps the original method intact for backward compatibility with other potential callers

**Why this is faster:**
In Python, function calls have significant overhead due to argument binding, stack frame creation, and method resolution. Since `write_uint64` is a hot-path method that's called frequently (4,066 times in the profiler data), eliminating the indirection to `_write_num` provides measurable performance gains. The line profiler shows the function call itself took 100% of the time in the original version, which is completely eliminated in the optimized version.

**Performance characteristics:**
- **Best for frequent writes**: Test results show 17-38% improvements across various scenarios, with the largest gains on simple cases like writing zero (30.6% faster)
- **Scales well**: Large-scale tests with 1000 writes maintain ~18% improvement, indicating consistent performance gains
- **No regression on edge cases**: Error handling and multiple write scenarios show similar improvements (20-27% faster)

This optimization is particularly effective for Bitcoin transaction processing where uint64 values (timestamps, amounts, etc.) are frequently serialized.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 13, 2025 18:18
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant