Skip to content

Bytecode parity#7475

Open
youknowone wants to merge 7 commits intoRustPython:mainfrom
youknowone:bytecode-parity-phase1
Open

Bytecode parity#7475
youknowone wants to merge 7 commits intoRustPython:mainfrom
youknowone:bytecode-parity-phase1

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Mar 20, 2026

Summary by CodeRabbit

  • Bug Fixes

    • Corrected behavior for container-modifying operations in comprehensions and exception-star handling to prevent incorrect item placement.
    • Fixed loading/storing of module and class attributes and improved method-call attribute lookup for more reliable attribute resolution.
    • Ensured boolean evaluation and conditional bytecode emit consistently for a wider range of expressions.
  • Refactor

    • Improved final-return handling across code paths to make function exit behavior more consistent.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 20, 2026

📝 Walkthrough

Walkthrough

Compiler, IR, and VM changes adjust how container-modifying bytecodes are indexed, move MAKE_CELL/COPY_FREE_VARS emission earlier in scope prologues, add a CFG pass duplicating end-of-function returns, and change several name/attribute and comprehension-related bytecode argument conventions.

Changes

Cohort / File(s) Summary
Compiler codegen
crates/codegen/src/compile.rs
Many bytecode-emission adjustments: streaming/append ops use index i:1 (various List/Set/Map ops and try-star handling); comprehension append/add index uses (generators.len() + 1); enter_scope now emits MAKE_CELL for each cellvar and COPY_FREE_VARS (when non-empty) before RESUME; module/class name/doc/firstlineno emission order and StoreName/LoadName usage changed; class bodies now emit __static_attributes__; compile_jump_if fallback emits ToBool for non-Compare expressions; attribute load for imported module names uses plain LOAD_ATTR + PUSH_NULL instead of method-mode.
IR finalization / CFG pass
crates/codegen/src/ir.rs
Added duplicate_end_returns(blocks: &mut [Block]) and invoked it during CodeInfo::finalize_code after normalize_jumps — clones final LOAD_CONST; RETURN_VALUE epilogue into blocks that fall through to the final block when appropriate.
VM frame stack indexing
crates/vm/src/frame.rs
Adjusted stack-depth selection for container-modifying instructions by decrementing index lookup (i.get(arg) - 1), changing which stack slot is chosen for ListAppend/ListExtend/MapAdd/SetAdd/SetUpdate.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Compiler
participant IR as CodeInfo/CFG
participant VM
Compiler->>IR: emit bytecode (MAKE_CELL / COPY_FREE_VARS before RESUME;\nadjusted List/Set/Map append indices;\nLoadName/StoreName changes)
IR->>IR: normalize_jumps()\nduplicate_end_returns()
IR->>VM: finalized code object
VM->>VM: execution uses adjusted stack-indexing\nfor container-modifying bytecodes

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • ShaharNaveh

Poem

🐰 I hopped through opcodes, small and spry,
Moved MAKE_CELL up where the prologue lies,
Duplicated returns that softly end flows,
Fixed append indexes so the stack knows,
Static tuples gleam — a rabbit's bytecode sigh.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Bytecode parity' is vague and does not clearly convey the specific changes made across three files (compile.rs, ir.rs, and frame.rs) involving stackframe indexing, CFG optimizations, and bytecode instruction handling. Consider a more descriptive title such as 'Fix bytecode stack indexing and CFG optimizations for container operations' to better reflect the primary changes.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone force-pushed the bytecode-parity-phase1 branch from ab3cbd9 to 7ff591f Compare March 20, 2026 11:41
@github-actions
Copy link
Contributor

github-actions bot commented Mar 20, 2026

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[ ] test: cpython/Lib/test/test_descr.py (TODO: 40)
[ ] test: cpython/Lib/test/test_descrtut.py (TODO: 3)

dependencies:

dependent tests: (no tests depend on descr)

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

@youknowone youknowone marked this pull request as ready for review March 20, 2026 12:18
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/codegen/src/compile.rs`:
- Around line 6732-6735: The comment above the ToBool emission is misleading
because ast::Expr::Compare may produce non-bool values; update the comment near
the if !matches!(expression, ast::Expr::Compare) check to state that the Compare
fast path is safe due to the jump opcode’s truthiness handling (not because
Compare always yields a bool), and clarify that everything else needs
Instruction::ToBool before branching; reference the emit!(self,
Instruction::ToBool) call and the ast::Expr::Compare pattern in the comment.
- Around line 4626-4649: The current block always emits and stores an empty
tuple to __static_attributes__ because enter_scope() initializes
static_attributes to Some(empty) but nothing is added before this emit; update
compile.rs to skip emitting/storing __static_attributes__ when the collected
attrs vector is empty (i.e., check the
code_stack.last().unwrap().static_attributes contents and only call
emit_load_const/Instruction::StoreName/name("__static_attributes__") if attrs is
non-empty), or alternatively change enter_scope() to initialize
static_attributes as None and only create/populate it when the collector runs;
target the symbols static_attributes, enter_scope, emit_load_const,
Instruction::StoreName and name("__static_attributes__") when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: df81d4bb-8612-4e9c-a28a-4f556bc8453f

📥 Commits

Reviewing files that changed from the base of the PR and between 2ef77f8 and 7ff591f.

⛔ Files ignored due to path filters (4)
  • Lib/test/test_descr.py is excluded by !Lib/**
  • crates/codegen/src/snapshots/rustpython_codegen__compile__tests__if_ands.snap is excluded by !**/*.snap
  • crates/codegen/src/snapshots/rustpython_codegen__compile__tests__if_mixed.snap is excluded by !**/*.snap
  • crates/codegen/src/snapshots/rustpython_codegen__compile__tests__if_ors.snap is excluded by !**/*.snap
📒 Files selected for processing (1)
  • crates/codegen/src/compile.rs

Comment on lines +4626 to +4649
// Emit __static_attributes__ tuple
{
let attrs: Vec<String> = self
.code_stack
.last()
.unwrap()
.static_attributes
.as_ref()
.map(|s| s.iter().cloned().collect())
.unwrap_or_default();
self.emit_load_const(ConstantData::Tuple {
elements: attrs
.into_iter()
.map(|s| ConstantData::Str { value: s.into() })
.collect(),
});
let static_attrs_name = self.name("__static_attributes__");
emit!(
self,
Instruction::StoreName {
namei: static_attrs_name
}
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

__static_attributes__ is always serialized as () here.

enter_scope() initializes every class with Some(IndexSet::default()) on Line 1120, but this file never inserts into that set before reaching this block. The new store therefore overwrites every class body’s __static_attributes__ with an empty tuple, including any explicit assignment made in body. Either populate static_attributes before this point or skip the store until the set is non-empty.

Minimal guard to avoid clobbering classes until the collector exists
-        // Emit __static_attributes__ tuple
-        {
-            let attrs: Vec<String> = self
-                .code_stack
-                .last()
-                .unwrap()
-                .static_attributes
-                .as_ref()
-                .map(|s| s.iter().cloned().collect())
-                .unwrap_or_default();
-            self.emit_load_const(ConstantData::Tuple {
-                elements: attrs
-                    .into_iter()
-                    .map(|s| ConstantData::Str { value: s.into() })
-                    .collect(),
-            });
-            let static_attrs_name = self.name("__static_attributes__");
-            emit!(
-                self,
-                Instruction::StoreName {
-                    namei: static_attrs_name
-                }
-            );
-        }
+        if let Some(attrs) = self
+            .code_stack
+            .last()
+            .and_then(|info| info.static_attributes.as_ref())
+            .filter(|attrs| !attrs.is_empty())
+        {
+            self.emit_load_const(ConstantData::Tuple {
+                elements: attrs
+                    .iter()
+                    .cloned()
+                    .map(|s| ConstantData::Str { value: s.into() })
+                    .collect(),
+            });
+            let static_attrs_name = self.name("__static_attributes__");
+            emit!(self, Instruction::StoreName { namei: static_attrs_name });
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/codegen/src/compile.rs` around lines 4626 - 4649, The current block
always emits and stores an empty tuple to __static_attributes__ because
enter_scope() initializes static_attributes to Some(empty) but nothing is added
before this emit; update compile.rs to skip emitting/storing
__static_attributes__ when the collected attrs vector is empty (i.e., check the
code_stack.last().unwrap().static_attributes contents and only call
emit_load_const/Instruction::StoreName/name("__static_attributes__") if attrs is
non-empty), or alternatively change enter_scope() to initialize
static_attributes as None and only create/populate it when the collector runs;
target the symbols static_attributes, enter_scope, emit_load_const,
Instruction::StoreName and name("__static_attributes__") when making the change.

Comment on lines +6732 to +6735
// Compare already produces a bool; everything else needs TO_BOOL
if !matches!(expression, ast::Expr::Compare(_)) {
emit!(self, Instruction::ToBool);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The Compare rationale is inaccurate.

Rich comparisons can return arbitrary objects; the reason this fast path is safe is the jump opcode’s truthiness handling, not that Compare itself always yields a bool. Please reword the comment so future refactors do not depend on the wrong invariant.

Suggested comment update
-                // Compare already produces a bool; everything else needs TO_BOOL
+                // Rich comparisons may return non-bool objects; skipping TO_BOOL
+                // here is only correct because PopJumpIf* performs truthiness
+                // conversion itself.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Compare already produces a bool; everything else needs TO_BOOL
if !matches!(expression, ast::Expr::Compare(_)) {
emit!(self, Instruction::ToBool);
}
// Rich comparisons may return non-bool objects; skipping TO_BOOL
// here is only correct because PopJumpIf* performs truthiness
// conversion itself.
if !matches!(expression, ast::Expr::Compare(_)) {
emit!(self, Instruction::ToBool);
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/codegen/src/compile.rs` around lines 6732 - 6735, The comment above
the ToBool emission is misleading because ast::Expr::Compare may produce
non-bool values; update the comment near the if !matches!(expression,
ast::Expr::Compare) check to state that the Compare fast path is safe due to the
jump opcode’s truthiness handling (not because Compare always yields a bool),
and clarify that everything else needs Instruction::ToBool before branching;
reference the emit!(self, Instruction::ToBool) call and the ast::Expr::Compare
pattern in the comment.

- Emit TO_BOOL before POP_JUMP_IF_TRUE/FALSE in the general case
  of compile_jump_if (Compare expressions excluded since they
  already produce a bool)
- Module-level __doc__: use STORE_NAME instead of STORE_GLOBAL
- Class body __module__: use LOAD_NAME instead of LOAD_GLOBAL
- Class body: store __firstlineno__ before __doc__
Emit MAKE_CELL for each cell variable and COPY_FREE_VARS N for
free variables at the start of each code object, before RESUME.
These instructions are no-ops in the VM but align the bytecode
with CPython 3.14's output.
Store a tuple of attribute names (currently always empty) as
__static_attributes__ in the class namespace, matching CPython
3.14's class body epilogue. Attribute name collection from
self.xxx accesses is a follow-up task.
test_iter_keys, test_iter_values, test_iter_items now pass
because class bodies emit __static_attributes__ and
__firstlineno__, matching the expected dict key set.
Switch LIST_APPEND, LIST_EXTEND, SET_ADD, SET_UPDATE, MAP_ADD
from 0-based to 1-based stack depth argument, matching CPython's
PEEK(oparg) convention. Adjust the VM to subtract 1 before
calling nth_value.
When the call target is an attribute of an imported name (e.g.,
logging.getLogger()), use plain LOAD_ATTR (method_flag=0) with
a separate PUSH_NULL instead of method-mode LOAD_ATTR. This
matches CPython 3.14's behavior which avoids the method call
optimization for module attribute access.
When the last block in a code object is exactly LOAD_CONST None +
RETURN_VALUE (the implicit return), duplicate these instructions
into blocks that would otherwise fall through to it. This matches
CPython 3.14's behavior of giving each code path its own explicit
return instruction.
@youknowone youknowone force-pushed the bytecode-parity-phase1 branch from 7ff591f to 904bdaf Compare March 20, 2026 13:56
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
crates/codegen/src/compile.rs (2)

6732-6735: ⚠️ Potential issue | 🟡 Minor

Reword the Compare fast-path comment.

Rich comparisons can return arbitrary objects. Skipping TO_BOOL here is only safe because PopJumpIf* performs truthiness conversion itself.

Suggested comment update
-                // Compare already produces a bool; everything else needs TO_BOOL
+                // Rich comparisons may return non-bool objects; skipping TO_BOOL
+                // here is only correct because PopJumpIf* performs truthiness
+                // conversion itself.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/codegen/src/compile.rs` around lines 6732 - 6735, Update the comment
above the Compare fast-path to note that rich comparisons may return arbitrary
objects and that we only skip emitting Instruction::ToBool for
ast::Expr::Compare because the subsequent conditional jump opcodes
(PopJumpIfTrue / PopJumpIfFalse / other PopJumpIf* handlers) perform the
truthiness conversion themselves; reference the match on expression
(ast::Expr::Compare), the emit!(self, Instruction::ToBool) call, and the
PopJumpIf* jump handlers to make the reason explicit.

4626-4649: ⚠️ Potential issue | 🟠 Major

__static_attributes__ is still always written as ().

This file only initializes static_attributes; it never populates it before this block runs. As written, every class body overwrites __static_attributes__ with an empty tuple, and it does so after compiling body, so it also clobbers an explicit class-body assignment.

Suggested guard until the collector is wired up
-        // Emit __static_attributes__ tuple
-        {
-            let attrs: Vec<String> = self
-                .code_stack
-                .last()
-                .unwrap()
-                .static_attributes
-                .as_ref()
-                .map(|s| s.iter().cloned().collect())
-                .unwrap_or_default();
-            self.emit_load_const(ConstantData::Tuple {
-                elements: attrs
-                    .into_iter()
-                    .map(|s| ConstantData::Str { value: s.into() })
-                    .collect(),
-            });
-            let static_attrs_name = self.name("__static_attributes__");
-            emit!(
-                self,
-                Instruction::StoreName {
-                    namei: static_attrs_name
-                }
-            );
-        }
+        if let Some(attrs) = self
+            .code_stack
+            .last()
+            .and_then(|info| info.static_attributes.as_ref())
+            .filter(|attrs| !attrs.is_empty())
+        {
+            self.emit_load_const(ConstantData::Tuple {
+                elements: attrs
+                    .iter()
+                    .cloned()
+                    .map(|s| ConstantData::Str { value: s.into() })
+                    .collect(),
+            });
+            let static_attrs_name = self.name("__static_attributes__");
+            emit!(self, Instruction::StoreName { namei: static_attrs_name });
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/codegen/src/compile.rs` around lines 4626 - 4649, The code
unconditionally emits and stores __static_attributes__ as an empty tuple because
static_attributes is never populated; fix by guarding the emit: only call
emit_load_const + Instruction::StoreName for __static_attributes__ when
self.code_stack.last().unwrap().static_attributes is Some(vec) and the vec is
non-empty (i.e., contains attributes), so it doesn't overwrite an explicit
class-body assignment or write an empty tuple; use the existing
name("__static_attributes__"), emit_load_const, and the Instruction::StoreName
path when the Option has data.
🧹 Nitpick comments (1)
crates/vm/src/frame.rs (1)

2512-2512: Encode the new 1-based container depth in one place.

These five handlers now rely on the same subtle rule: the opcode arg is 1-based, nth_value() is 0-based, and the lookup happens after popping the transient operand(s). Repeating raw - 1 here makes that contract easy to miss or accidentally revert later; a tiny helper or short comment would make the parity rule much clearer.

Also applies to: 2522-2522, 2864-2864, 3195-3195, 3205-3205

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/vm/src/frame.rs` at line 2512, Create a small helper to centralize the
1-based→0-based conversion and use it from the handlers instead of repeating `-
1`; for example add a method on the same type (e.g., fn arg_to_index(&self, arg:
u32) -> usize or fn nth_arg_value(&self, arg: u32) -> Value) and replace
occurrences like `let obj = self.nth_value(i.get(arg) - 1);` in the handlers
referenced (the sites at the current diff and the ones you noted: the uses at
~2522, ~2864, ~3195, ~3205) to call that helper so the parity rule is enforced
in one place and clearly named. Ensure the helper documents that opcode args are
1-based and nth_value is 0-based.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/codegen/src/ir.rs`:
- Around line 1652-1660: The return-epilogue predicate (is_return_block) is too
broad — it matches any LOAD_CONST + RETURN_VALUE; narrow it by matching the
actual LoadConst payload to ensure it's specifically loading None: change the
second match clause that currently checks
AnyInstruction::Real(Instruction::LoadConst { .. }) to explicitly match the
constant variant for None (e.g. AnyInstruction::Real(Instruction::LoadConst {
value: Constant::None }) or the equivalent None representation used in this IR),
while keeping the subsequent AnyInstruction::Real(Instruction::ReturnValue)
check; use the existing symbols is_return_block, last_insts,
AnyInstruction::Real, Instruction::LoadConst and Instruction::ReturnValue to
locate and update the condition.

---

Duplicate comments:
In `@crates/codegen/src/compile.rs`:
- Around line 6732-6735: Update the comment above the Compare fast-path to note
that rich comparisons may return arbitrary objects and that we only skip
emitting Instruction::ToBool for ast::Expr::Compare because the subsequent
conditional jump opcodes (PopJumpIfTrue / PopJumpIfFalse / other PopJumpIf*
handlers) perform the truthiness conversion themselves; reference the match on
expression (ast::Expr::Compare), the emit!(self, Instruction::ToBool) call, and
the PopJumpIf* jump handlers to make the reason explicit.
- Around line 4626-4649: The code unconditionally emits and stores
__static_attributes__ as an empty tuple because static_attributes is never
populated; fix by guarding the emit: only call emit_load_const +
Instruction::StoreName for __static_attributes__ when
self.code_stack.last().unwrap().static_attributes is Some(vec) and the vec is
non-empty (i.e., contains attributes), so it doesn't overwrite an explicit
class-body assignment or write an empty tuple; use the existing
name("__static_attributes__"), emit_load_const, and the Instruction::StoreName
path when the Option has data.

---

Nitpick comments:
In `@crates/vm/src/frame.rs`:
- Line 2512: Create a small helper to centralize the 1-based→0-based conversion
and use it from the handlers instead of repeating `- 1`; for example add a
method on the same type (e.g., fn arg_to_index(&self, arg: u32) -> usize or fn
nth_arg_value(&self, arg: u32) -> Value) and replace occurrences like `let obj =
self.nth_value(i.get(arg) - 1);` in the handlers referenced (the sites at the
current diff and the ones you noted: the uses at ~2522, ~2864, ~3195, ~3205) to
call that helper so the parity rule is enforced in one place and clearly named.
Ensure the helper documents that opcode args are 1-based and nth_value is
0-based.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 7134ce0a-4869-417e-9966-1ac435da0e6a

📥 Commits

Reviewing files that changed from the base of the PR and between 7ff591f and 904bdaf.

⛔ Files ignored due to path filters (4)
  • Lib/test/test_descr.py is excluded by !Lib/**
  • crates/codegen/src/snapshots/rustpython_codegen__compile__tests__if_ands.snap is excluded by !**/*.snap
  • crates/codegen/src/snapshots/rustpython_codegen__compile__tests__if_mixed.snap is excluded by !**/*.snap
  • crates/codegen/src/snapshots/rustpython_codegen__compile__tests__if_ors.snap is excluded by !**/*.snap
📒 Files selected for processing (3)
  • crates/codegen/src/compile.rs
  • crates/codegen/src/ir.rs
  • crates/vm/src/frame.rs

Comment on lines +1652 to +1660
let is_return_block = last_insts.len() == 2
&& matches!(
last_insts[0].instr,
AnyInstruction::Real(Instruction::LoadConst { .. })
)
&& matches!(
last_insts[1].instr,
AnyInstruction::Real(Instruction::ReturnValue)
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Return-epilogue detection is too broad for the documented behavior

Line 1652 only checks LOAD_CONST + RETURN_VALUE, but the function is documented to duplicate only LOAD_CONST None + RETURN_VALUE. This can duplicate explicit constant returns as well, which risks bytecode parity/tracing mismatches. Please tighten the predicate to verify the loaded constant is specifically None.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/codegen/src/ir.rs` around lines 1652 - 1660, The return-epilogue
predicate (is_return_block) is too broad — it matches any LOAD_CONST +
RETURN_VALUE; narrow it by matching the actual LoadConst payload to ensure it's
specifically loading None: change the second match clause that currently checks
AnyInstruction::Real(Instruction::LoadConst { .. }) to explicitly match the
constant variant for None (e.g. AnyInstruction::Real(Instruction::LoadConst {
value: Constant::None }) or the equivalent None representation used in this IR),
while keeping the subsequent AnyInstruction::Real(Instruction::ReturnValue)
check; use the existing symbols is_return_block, last_insts,
AnyInstruction::Real, Instruction::LoadConst and Instruction::ReturnValue to
locate and update the condition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant