Skip to content

Conversation

@humzak711
Copy link

@humzak711 humzak711 commented Jan 23, 2026

Summary

This PR optimizes the x86_64 assembly code to reduce code size and reduce length changing prefix (LCP) stalls in hot code.

key optimizations include:

  • Reducing the use of length changing prefixes (LCPs) which cause bottlenecks in instruction predecoders in modern x86_64 microarchitectures
  • Using xor reg32, reg32 instead of xor reg64, reg64 to zero registers. In x86_64 32 bit operations are zero extended to 64 bits. This saves us one byte (the REX.W prefix) per instruction
  • Replacing cmp $0, reg with test reg, reg to reduce instruction length by one byte

Impact

  • Minor reduction in code size for the x86_64 kernel
  • Performance optimization in irq_common and up_saveusercontext by reducing predecoder stalls on microarchitectures sensitive to LCPs

Testing

  • Host: Linux x86_64
  • Board: qemu-intel64:nsh
  • ostest passed

@github-actions github-actions bot added Arch: x86_64 Issues related to the x86_64 architecture Size: S The size of the change in this PR is small labels Jan 23, 2026
@simbit18
Copy link
Contributor

Hi @humzak711, please fix

../nuttx/tools/checkpatch.sh -c -u -m -g dbbcd7c88ce1360e11773299657097fb39121c0a..HEAD
❌ Missing Signed-off-by
Used config files:
    1: .codespellrc
Some checks failed. For contributing guidelines, see:
  https://github.com/apache/nuttx/blob/master/CONTRIBUTING.md
Error: Process completed with exit code 1.

Signed-off-by: humzak711 <humzak711@gmail.com>

arch/x86_64: optimize assembly instructions for size and performance
@humzak711 humzak711 force-pushed the optimize-x86_64-asm branch from 83e2c23 to 2001d72 Compare January 23, 2026 16:05
@acassis
Copy link
Contributor

acassis commented Jan 23, 2026

@hujun260 please review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Arch: x86_64 Issues related to the x86_64 architecture Size: S The size of the change in this PR is small

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants