Skip to content

[DYNAREC] Modify natflags register save logic to now save all scratch registers#3880

Merged
ptitSeb merged 1 commit into
ptitSeb:mainfrom
zengdage:save-restore-temp-register-2
May 22, 2026
Merged

[DYNAREC] Modify natflags register save logic to now save all scratch registers#3880
ptitSeb merged 1 commit into
ptitSeb:mainfrom
zengdage:save-restore-temp-register-2

Conversation

@zengdage
Copy link
Copy Markdown
Contributor

Fix scratch register corruption caused by non-consecutive flags producer and consumer when BOX64_DYNAREC_TRACE is enabled, which introduced by #3876.

For example, the test rax, rax flags producer stores its flag calculation operands in scratch register t3. The next mov r14, rax instruction does not use scratch registers, but its associated trace code can still overwrite t3's value. This means we need to reference the flags consumer that is jz 0x0000003F0000ABC0 to identify which registers require saving.

[BOX64] 0x3f00009f68: 48 85 C0 test rax, rax
[BOX64] 0x3ff7afaef0: 53 emitted opcodes, inst=14, barrier=0 state=3/1(0), set=3F/80, use=0, need=0/80, fuse=1/0, sm=0(0/0), sew@entry=7, sew@exit=7, pred=13
	03f00e13	ADDI            t3, zero, 0x3f(63)
	45ccaa23	SW              t3, emu_s9, 0x454(1108)
	01087e33	AND             t3, rax_a6, rax_a6
	47ccb423	SD              t3, emu_s9, 0x468(1128)
[BOX64] New Instruction x64:0x3f00009f6b, native:0x3ff7afafc4
[BOX64] TRACE ----
[BOX64]  n1:0 n2:0 ----
	01f80b37	LUI             rip_s6, 0x1f80000(33030144)
	005b0b1b	ADDIW           rip_s6, rip_s6, 0x5(5)
	00db1b13	SLLI            rip_s6, rip_s6, 0xd(13)
	f6bb0b13	ADDI            rip_s6, rip_s6, 0xffffff6b(-149)
...............................................................
...............................................................
...............................................................
	048cb783	LD              r9_a5, emu_s9, 0x48(72)
	000cb803	LD              rax_a6, emu_s9, 0x0(0)
	088cbb03	LD              rip_s6, emu_s9, 0x88(136)
	080cbb83	LD              flags_s7, emu_s9, 0x80(128)
	fdfbfb93	ANDI            flags_s7, flags_s7, 0xffffffdf(-33)
	006bdf93	SRLI            t6, flags_s7, 0x6(6)
	020fff93	ANDI            t6, t6, 0x20(32)
	01fbebb3	OR              flags_s7, flags_s7, t6
[BOX64] ----------
[BOX64] 0x3f00009f6b: 49 89 C6 mov r14, rax
[BOX64] 0x3ff7afafc4: 54 emitted opcodes, inst=15, barrier=0 state=0/3(0), set=0/0, use=0, need=80/80, fuse=0/1, sm=0(0/0), sew@entry=7, sew@exit=7, pred=14
	00080a13	MV              r14_s4, rax_a6
[BOX64] New Instruction x64:0x3f00009f6e, native:0x3ff7afb09c
[BOX64] TRACE ----
[BOX64]  n1:28 n2:0 ----
	ff010113	ADDI            sp, sp, 0xfffffff0(-16)
	01c13023	SD              t3, sp, 0x0(0)
	01f80b37	LUI             rip_s6, 0x1f80000(33030144)
	01fbebb3	OR              flags_s7, flags_s7, t6
...............................................................
...............................................................
...............................................................
	00013e03	LD              t3, sp, 0x0(0)
	01010113	ADDI            sp, sp, 0x10(16)
[BOX64] ----------
[BOX64] 0x3f00009f6e: 0F 84 4C 0C 00 00 jz 0x0000003F0000ABC0
[BOX64] 0x3ff7afb09c: 67 emitted opcodes, inst=16, barrier=2 state=0/3(0), set=0/0, use=0, need=80/80, fuse=1/0, sm=0(0/0), sew@entry=7, sew@exit=7, pred=15, jmp=out
	020e1463	BNE             t3, zero, 0x28(40) # +10i(0x3ff7afb1a8)
	00000013	NOP

@zengdage
Copy link
Copy Markdown
Contributor Author

@ptitSeb @ksco Sorry, the scratch register usage detection method I shared last night has a flaw: for non-consecutive flags producer and consumer, the trace code for all intermediate x86-64 instructions between them must also save scratch registers to avoid data corruption. My fix traverses forward to find the target flags consumer, then fetches the exact scratch registers that need to be saved.
Please help to review again, thanks.

@ksco
Copy link
Copy Markdown
Collaborator

ksco commented May 21, 2026

This is getting complicated, and I don't like where it's going. On second thought, I think we should simply disable nativeflags if trace is effective. It does not make much sense to have trace and nativeflags both enabled anyway, right?

@ptitSeb
Copy link
Copy Markdown
Owner

ptitSeb commented May 21, 2026

Ok, disabling NativeFlags if Dynarec Trace is enabled looks fair enough to me.
But that's something to keep in mind when debugging, as the issue being debugged could be a NativeFlag one...

@zengdage zengdage force-pushed the save-restore-temp-register-2 branch from 13b9da5 to 891ddba Compare May 21, 2026 08:00
@ksco
Copy link
Copy Markdown
Collaborator

ksco commented May 21, 2026

But that's something to keep in mind when debugging, as the issue being debugged could be a NativeFlag one...

You will have the wrong flags state if nativeflags is enabled; that's also why we disabled nativeflags in cosim.

@zengdage
Copy link
Copy Markdown
Contributor Author

Can we just simplify this by saving all scratch registers? Trace is for debugging, performance isn't a priority. Adding ~10 extra instructions won't make any difference.

@ksco
Copy link
Copy Markdown
Collaborator

ksco commented May 21, 2026

Can we just simplify this by saving all scratch registers? Trace is for debugging, performance isn't a priority. Adding ~10 extra instructions won't make any difference.

And why do you need to enable nativeflags for trace?

@zengdage
Copy link
Copy Markdown
Contributor Author

Can we just simplify this by saving all scratch registers? Trace is for debugging, performance isn't a priority. Adding ~10 extra instructions won't make any difference.

And why do you need to enable nativeflags for trace?

Disabling nativeflags is also ok for me. I just think that directly preserving all scratch registers is a very straightforward fix for this issue, and it will not cause any noticeable impact in trace mode.

@ptitSeb
Copy link
Copy Markdown
Owner

ptitSeb commented May 21, 2026

Well, disable NativeFlags or Save all the registers, I don't mind any of those, as they both have their pros and cons in my point of view.

@ksco
Copy link
Copy Markdown
Collaborator

ksco commented May 21, 2026

Well, disable NativeFlags or Save all the registers, I don't mind any of those, as they both have their pros and cons in my point of view.

But what are the cons of disabling nativeflags in trace mode? Trace mode is very similar to cosim, but it is for human manual analysis, so it makes sense to disable it IMO.

@ptitSeb
Copy link
Copy Markdown
Owner

ptitSeb commented May 21, 2026

Well, disable NativeFlags or Save all the registers, I don't mind any of those, as they both have their pros and cons in my point of view.

But what are the cons of disabling nativeflags in trace mode? Trace mode is very similar to cosim, but it is for human manual analysis, so it makes sense to disable it IMO.

The cons are, mostly, disabling native flags prevent you from using cosim & trace to debug native flags issues. you will use a different code path in trace than without trace.

@ksco
Copy link
Copy Markdown
Collaborator

ksco commented May 21, 2026

Well, you can't use cosim & trace to debug nativeflags issues, nativeflags is meant to skip those FLAGS computations...

@ptitSeb
Copy link
Copy Markdown
Owner

ptitSeb commented May 21, 2026

Well, you can't use cosim & trace to debug nativeflags issues, nativeflags is meant to skip those FLAGS computations...

The x64 opcode flow is still respected. Trace don't give access to native stuffs, just the x64 part. The flow on jump or conditionnal moved from a trace can be used to debug issue with native flags: you can spot an incoherent branch taken for example.

Again, I'm not against any of the change. If you prefer to disable native flags for trace, I'm fine with that.

@ksco
Copy link
Copy Markdown
Collaborator

ksco commented May 21, 2026

Again, I'm not against any of the change. If you prefer to disable native flags for trace, I'm fine with that.

Bah, me neither. @zengdage If you decide to do the scratch registers spill/restore, please refer to this closed PR for suggested macro names and add changes for other backends too: https://github.com/ptitSeb/box64/pull/3879/changes

@zengdage zengdage force-pushed the save-restore-temp-register-2 branch from 891ddba to 5ca8514 Compare May 22, 2026 02:58
@zengdage
Copy link
Copy Markdown
Contributor Author

Again, I'm not against any of the change. If you prefer to disable native flags for trace, I'm fine with that.

Bah, me neither. @zengdage If you decide to do the scratch registers spill/restore, please refer to this closed PR for suggested macro names and add changes for other backends too: https://github.com/ptitSeb/box64/pull/3879/changes

Done. Modify nat flag register spill logic to now save all scratch registers and add implementation for LA64 and PPC64LE too.

Copy link
Copy Markdown
Collaborator

@ksco ksco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please update your PR and commit title/desc to reflect the changes.

@zengdage zengdage changed the title [RV64_DYNAREC] Fix bug in SAVE_ACTIVE_SCRATCH_REGISTERS [RV64_DYNAREC] Modify nat flag register save logic to now save all scratch registers May 22, 2026
@zengdage zengdage changed the title [RV64_DYNAREC] Modify nat flag register save logic to now save all scratch registers [RV64_DYNAREC] Modify nat flag register save logic to now save all scratch registers and add implementation for LA64 and PPC64LE May 22, 2026
@ksco ksco changed the title [RV64_DYNAREC] Modify nat flag register save logic to now save all scratch registers and add implementation for LA64 and PPC64LE [DYNAREC] Modify natflags register save logic to now save all scratch registers May 22, 2026
@ksco ksco requested a review from ptitSeb May 22, 2026 08:03
Comment thread src/dynarec/dynarec_native_pass.c Outdated
|| ((ip >= trace_start) && (ip < trace_end))) {
MESSAGE(LOG_DUMP, "TRACE ----\n");
if (BOX64ENV(dynarec_nativeflags)) SAVE_ACTIVE_SCRATCH_REGISTERS;
#if defined (RV64) || defined(LA64) || defined(PPC64LE)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#if defined (RV64) || defined(LA64) || defined(PPC64LE)
#if defined (SPILL_NF_REGISTERS)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread src/dynarec/dynarec_native_pass.c Outdated
GO_TRACE(PrintTrace, 1, x5);
fpu_unreflectcache(dyn, ninst, x1, x2, x3);
if (BOX64ENV(dynarec_nativeflags)) LOAD_ACTIVE_SCRATCH_REGISTERS;
#if defined (RV64) || defined(LA64) || defined(PPC64LE)
Copy link
Copy Markdown
Owner

@ptitSeb ptitSeb May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#if defined (RV64) || defined(LA64) || defined(PPC64LE)
#if defined (RESTORE_NF_REGISTERS)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ptitSeb
Copy link
Copy Markdown
Owner

ptitSeb commented May 22, 2026

I suggested some change for simplicity

…ratch registers

1. Rename macro to SPILL_NF_REGISTERS, add implementation for LA64 and PPC64LE.
2. Modify nat flag register spill logic to now save all scratch registers.

Fix scratch register corruption caused by non-consecutive flags producer and
consumer when BOX64_DYNAREC_TRACE is enabled, which introduced by ptitSeb#3876.

For example, the `test rax, rax` flags producer stores its flag calculation
operands in scratch register `t3`. The next `mov r14, rax` instruction does
not use scratch registers, but its associated trace code can still overwrite
t3's value. This means we need to reference the flags consumer that is
`jz 0x0000003F0000ABC0` to identify which registers require saving. But this is
too complicated. So we went with the simpler approach of saving all scratch
registers, this won't add noticeable performance overhead in trace mode.

```
[BOX64] 0x3f00009f68: 48 85 C0 test rax, rax
[BOX64] 0x3ff7afaef0: 53 emitted opcodes, inst=14, barrier=0 state=3/1(0), set=3F/80, use=0, need=0/80, fuse=1/0, sm=0(0/0), sew@entry=7, sew@exit=7, pred=13
	03f00e13	ADDI            t3, zero, 0x3f(63)
	45ccaa23	SW              t3, emu_s9, 0x454(1108)
	01087e33	AND             t3, rax_a6, rax_a6
	47ccb423	SD              t3, emu_s9, 0x468(1128)
[BOX64] New Instruction x64:0x3f00009f6b, native:0x3ff7afafc4
[BOX64] TRACE ----
[BOX64]  n1:0 n2:0 ----
	01f80b37	LUI             rip_s6, 0x1f80000(33030144)
	005b0b1b	ADDIW           rip_s6, rip_s6, 0x5(5)
	00db1b13	SLLI            rip_s6, rip_s6, 0xd(13)
	f6bb0b13	ADDI            rip_s6, rip_s6, 0xffffff6b(-149)
...............................................................
...............................................................
...............................................................
	048cb783	LD              r9_a5, emu_s9, 0x48(72)
	000cb803	LD              rax_a6, emu_s9, 0x0(0)
	088cbb03	LD              rip_s6, emu_s9, 0x88(136)
	080cbb83	LD              flags_s7, emu_s9, 0x80(128)
	fdfbfb93	ANDI            flags_s7, flags_s7, 0xffffffdf(-33)
	006bdf93	SRLI            t6, flags_s7, 0x6(6)
	020fff93	ANDI            t6, t6, 0x20(32)
	01fbebb3	OR              flags_s7, flags_s7, t6
[BOX64] ----------
[BOX64] 0x3f00009f6b: 49 89 C6 mov r14, rax
[BOX64] 0x3ff7afafc4: 54 emitted opcodes, inst=15, barrier=0 state=0/3(0), set=0/0, use=0, need=80/80, fuse=0/1, sm=0(0/0), sew@entry=7, sew@exit=7, pred=14
	00080a13	MV              r14_s4, rax_a6
[BOX64] New Instruction x64:0x3f00009f6e, native:0x3ff7afb09c
[BOX64] TRACE ----
[BOX64]  n1:28 n2:0 ----
	ff010113	ADDI            sp, sp, 0xfffffff0(-16)
	01c13023	SD              t3, sp, 0x0(0)
	01f80b37	LUI             rip_s6, 0x1f80000(33030144)
	01fbebb3	OR              flags_s7, flags_s7, t6
...............................................................
...............................................................
...............................................................
	00013e03	LD              t3, sp, 0x0(0)
	01010113	ADDI            sp, sp, 0x10(16)
[BOX64] ----------
[BOX64] 0x3f00009f6e: 0F 84 4C 0C 00 00 jz 0x0000003F0000ABC0
[BOX64] 0x3ff7afb09c: 67 emitted opcodes, inst=16, barrier=2 state=0/3(0), set=0/0, use=0, need=80/80, fuse=1/0, sm=0(0/0), sew@entry=7, sew@exit=7, pred=15, jmp=out
	020e1463	BNE             t3, zero, 0x28(40) # +10i(0x3ff7afb1a8)
	00000013	NOP

```
@zengdage zengdage force-pushed the save-restore-temp-register-2 branch from 5ca8514 to 319b3ce Compare May 22, 2026 08:23
@ptitSeb ptitSeb merged commit 419a5b2 into ptitSeb:main May 22, 2026
28 checks passed
@zengdage zengdage deleted the save-restore-temp-register-2 branch May 22, 2026 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants