WIP: Crossversion Testing #181

2elli · 2025-12-23T21:42:25Z

I improved cross version testing, and think it may be ready for development use. It is currently quite strict, so finds very small discrepancies between dis and xdis. I think with a little more work it could be used to automatically find changes and differences when adding new versions of python to xdis; and make sure they still work with older versions of python.

Here is an example output of a test run. This automatically finds a couple minor difference in xdis and dis.

Running xdis on 3.14, disassembling a 3.14 bytecode, xdis's argval is 0 while dis's argval is <.
Running xdis on 3.14, disassembling a 3.12 bytecode, xdis's argval is 0 while dis's argval is (None, False)
Running xdis on 3.12, disassembling a 3.13 bytecode, there is a difference in the constants table (truncated length)

_____________________________________________________________________________________ test_version[3.14] ______________________________________________________________________________________

version = '3.14'

    @pytest.mark.parametrize("version", get_versions())
    def test_version(version):
        """Test each version in compiled template folder."""
        for case in get_tests_by_version(version):
>           assert case.serialized_dis.splitlines() == case.serialized_xdis.splitlines(), case.fail_message
E           AssertionError: Running version 3.14, failed equivalence; xdis:core_3.14.pyc != dis:core_3.14.txt
E           assert ['BYTECODE <m...None,)]', ...] == ['BYTECODE <m...None,)]', ...]
E
E             At index 1129 diff: '74 IS_OP : 0 0' != '74 IS_OP : 0 <'
E             Use -v to get more diff

test_xdis.py:68: AssertionError
_____________________________________________________________________________________ test_version[3.12] ______________________________________________________________________________________

version = '3.12'

    @pytest.mark.parametrize("version", get_versions())
    def test_version(version):
        """Test each version in compiled template folder."""
        for case in get_tests_by_version(version):
>           assert case.serialized_dis.splitlines() == case.serialized_xdis.splitlines(), case.fail_message
E           AssertionError: Running version 3.14, failed equivalence; xdis:serialize_bytecode_3.12.pyc != dis:serialize_bytecode_3.12.txt
E           assert ['BYTECODE <m...'str')]", ...] == ['BYTECODE <m...'str')]", ...]
E
E             At index 429 diff: '155 FORMAT_VALUE : 0 (None, False)' != '155 FORMAT_VALUE : 0 0'
E             Use -v to get more diff

test_xdis.py:68: AssertionError
_____________________________________________________________________________________ test_version[3.13] ______________________________________________________________________________________

version = '3.13'

    @pytest.mark.parametrize("version", get_versions())
    def test_version(version):
        """Test each version in compiled template folder."""
        for case in get_tests_by_version(version):
>           assert case.serialized_dis.splitlines() == case.serialized_xdis.splitlines(), case.fail_message
E           AssertionError: Running version 3.14, failed equivalence; xdis:_compat_3.13.pyc != dis:_compat_3.13.txt
E           assert ['BYTECODE <m...one]')]", ...] == ['BYTECODE <m...one]')]", ...]
E
E             At index 1356 diff: 'consts : [None, \'b\', \'-\', \'<codeobj <genexpr>\', (\'w\', \'a\', \'x\'), False, (\'encoding\'
E
E             ...Full output truncated (2 lines hidden), use '-vv' to show

test_xdis.py:68: AssertionError

@rocky @jdw170000 thoughts?

rocky · 2025-12-24T01:49:05Z

Seems like a reasonable thing. I'll try to look at this when I get a chance. Thanks for the PR.

I am too often finding stuff in this code base that is in great need of improvement, and this may be one of those areas.

(One of the other areas is making sure the marshal and unmarshal routines round trip properly. Or getting opcode classification correct.)

rocky · 2025-12-24T12:30:23Z

I just had a chance to look at and try. It's okay to put in so that the code doesn't become stale as things change in the code base.

As you note, it can't be used automatically as is without more work. Some of the fragility is indeed baked into the nature of what xdis is doing.

Let me explain....

Ideally, Python's dis output would be the same as pydisasm's when --format classic is used, which currently is the default format. But as you may know already, this poses a lot of challenges...

First, dis output has changed over time — presumably, the later dis output is more helpful. But for this to work in an automated fashion, either this means more work in xdis's formatting routines (for version x we format this way, but for version y we change that because Python's dis module changed in those Python versions), or the test allows for the presumably better later dis output.

But I am seeing it gets worse than that. Recently, I've added RustPython disassembly. RustPython decided that it wanted more "friendly" opcode names. So LOAD_NAME is LoadNameAny in RustPython, and SUBSCR is Subscript.

This adds the dilemma of whether we want to match the RustPython names, which are generally less familiar to folks who know Python bytecode (CPython, PyPy, or Graal), or whether we want the dis matching to work automatically.

For automated testing here, RustPython names are better (except you'd still need to figure out the dance to get RustPython to use RustPython's builtin dis as opposed to the standard Python library dis module. And a similar thing happens for GraalPython.

But if what you are doing is analysis, using common names likeLOAD_NAME and SUBSCR can simplify analysis.

Of course, the RustPython convention can be used only when --format classic is used, and maybe the default will change to some other format. But this forces for RustPython basically storing two sets of opcodes name: one that uses SUBSCR and the other that uses Subscript.

The tendency is to do that. But this is more work.

2elli · 2025-12-24T22:56:41Z

@rocky - thanks for this info, these are some important things to consider.

Initially I had started developing crossversion testing when we ran into issues when working on pylingual where "native" disassembly would be different than "non-native" (say xdis on 3.12 disassembling a bytecode of version 3.13, compared to python version 3.13 disassembling a bytecode of version 3.12).
I wanted an automatic way to catch differences, especially with some code object specifics like the line table or exceptions that may be harder to catch.

As I made it, and from my understanding, this test should be "format agnostic", as it separately "serializes" the instructions and code object attrs.
I think my understanding is limited here though, does using a different format in xdis result in differences in the code object itself?

In terms of RustPython, I really am not too familiar. If maintaining some translation between the "friendly names" and CPython opnames is the best approach, I think that makes sense to me. I think either way, implementing this in this tester wont be too hard considering the serialization process already does some "massaging" of the bytecode. I could see integrating with tox/pyenv/uv being a challenge though.

rocky · 2025-12-24T23:52:12Z

@rocky - thanks for this info, these are some important things to consider.

Initially I had started developing crossversion testing when we ran into issues when working on pylingual where "native" disassembly would be different than "non-native" (say xdis on 3.12 disassembling a bytecode of version 3.13, compared to python version 3.13 disassembling a bytecode of version 3.12). I wanted an automatic way to catch differences, especially with some code object specifics like the line table or exceptions that may be harder to catch.

As I made it, and from my understanding, this test should be "format agnostic", as it separately "serializes" the instructions and code object attrs. I think my understanding is limited here though, does using a different format in xdis result in differences in the code object itself?

There can be differences, although probably not the kind that will bother the LLM too much. In marshalling, dictionaries and sets that have the same content might appear in different orders. The semantics of course are the same, but when printed, the order can be different.

Oddly, this can happen in the same version of Python, and Python considers the fact that sets and dictionaries may appear in arbitrary order depending on whim and the time of day, to be a feature, not a bug.

Since it is useful for marshal/unmarshal to round-trip to get the same results, recently I've added extra fields to capture the specific order in which such kinds of objects appear.

In terms of RustPython, I really am not too familiar. If maintaining some translation between the "friendly names" and CPython opnames is the best approach, I think that makes sense to me. I think either way, implementing this in this tester wont be too hard considering the serialization process already does some "massaging" of the bytecode. I could see integrating with tox/pyenv/uv being a challenge though.

I just wanted to point out that what the "right" thing to do depends on what you are trying to accomplish. If you are someone who is familiar with Python bytecode, the "user-friendly" CamelCase names get in the way.

Yes, we have that --format option that provides a means for a user to indicate what is desired. But It's a bit more work to support it, though. (Oh, I remember reading this paper that mentioned the decompilation process and how much maintenance time was needed to support doing this.)

2elli · 2025-12-25T18:47:56Z

Happy Holidays Rocky :)

There can be differences, although probably not the kind that will bother the LLM too much. In marshalling, dictionaries and sets that have the same content might appear in different orders. The semantics of course are the same, but when printed, the order can be different.

Oddly, this can happen in the same version of Python, and Python considers the fact that sets and dictionaries may appear in arbitrary order depending on whim and the time of day, to be a feature, not a bug.

That's good to know, I was not aware. I knew sets were ordered randomly, but looking into it, I see that the arbitrary order is decided literally randomly at compile time. I'm surprised I hadn't seen this before. python/cpython#73894.

The way I am "serializing" would definitely miss this, so that might be something to consider.

I could pull this in and then investigate further and make a new PR if that is good to you.

I just wanted to point out that what the "right" thing to do depends on what you are trying to accomplish. If you are someone who is familiar with Python bytecode, the "user-friendly" CamelCase names get in the way.

Yes, we have that --format option that provides a means for a user to indicate what is desired. But It's a bit more work to support it, though. (Oh, I remember reading this paper that mentioned the decompilation process and how much maintenance time was needed to support doing this.)

My personal opinion is I wouldn't expect xdis to have the capability to keep a translation like this. I think it could definitely be useful for certain people as you say though.

rocky · 2025-12-25T19:46:35Z

Happy Holidays Rocky :)

Happy Holidays!

I could pull this in and then investigate further and make a new PR if that is good to you.

The work you and others do is always appreciated.

Also note that there is some work here done in xdis/test_roundtrip.py and the collection_order field of a VersionIndependentUnmarshaller object.

2elli added 5 commits December 23, 2025 14:59

automatically make testing dirs

eafc36f

copy test files with find

686dfdb

add better debug messages

de13b96

update usage.md

d8336dc

improve debug message

7c1dfbd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

WIP: Crossversion Testing #181

WIP: Crossversion Testing #181

Uh oh!

2elli commented Dec 23, 2025 •

edited

Loading

Uh oh!

rocky commented Dec 24, 2025

Uh oh!

rocky commented Dec 24, 2025 •

edited

Loading

Uh oh!

2elli commented Dec 24, 2025

Uh oh!

rocky commented Dec 24, 2025

Uh oh!

2elli commented Dec 25, 2025

Uh oh!

rocky commented Dec 25, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

WIP: Crossversion Testing #181

Are you sure you want to change the base?

WIP: Crossversion Testing #181

Uh oh!

Conversation

2elli commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocky commented Dec 24, 2025

Uh oh!

rocky commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

2elli commented Dec 24, 2025

Uh oh!

rocky commented Dec 24, 2025

Uh oh!

2elli commented Dec 25, 2025

Uh oh!

rocky commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

2elli commented Dec 23, 2025 •

edited

Loading

rocky commented Dec 24, 2025 •

edited

Loading

rocky commented Dec 25, 2025 •

edited

Loading