Skip to content

Conversation

@ArcaneNibble
Copy link
Member

Opcodes 0xA0-0xA3 can access a 64-bit absolute address. Before this change, LLVM would require this to be written as movabs, and writing it as mov would silently truncate the address.

After this change, if mov moffset is used with a constant expression which evaluates to a value that doesn't fit in 32 bits, the instruction will automatically be changed to movabs. This should match the behavior of more recent versions of gas.

The one existing test which expected a silent truncation + sign-extend is removed.

This change does not affect mov opcodes that reference an external symbol. Using mov will continue to generate a 32-bit address and reloc_signed_4byte, and movabs is required to specify a 64-bit address.

Fixes #73481

Opcodes 0xA0-0xA3 can access a 64-bit absolute address.
Before this change, LLVM would require this to be written as `movabs`,
and writing it as `mov` would silently truncate the address.

After this change, if `mov moffset` is used with a constant expression
which evaluates to a value that doesn't fit in 32 bits, the instruction will
automatically be changed to `movabs`. This should match the behavior of
more recent versions of gas.

The one existing test which expected a silent truncation + sign-extend
is removed.

This change does not affect `mov` opcodes that reference an external symbol.
Using `mov` will continue to generate a 32-bit address and reloc_signed_4byte,
and `movabs` is required to specify a 64-bit address.
@llvmbot
Copy link
Member

llvmbot commented Dec 19, 2025

@llvm/pr-subscribers-backend-x86

Author: R (ArcaneNibble)

Changes

Opcodes 0xA0-0xA3 can access a 64-bit absolute address. Before this change, LLVM would require this to be written as movabs, and writing it as mov would silently truncate the address.

After this change, if mov moffset is used with a constant expression which evaluates to a value that doesn't fit in 32 bits, the instruction will automatically be changed to movabs. This should match the behavior of more recent versions of gas.

The one existing test which expected a silent truncation + sign-extend is removed.

This change does not affect mov opcodes that reference an external symbol. Using mov will continue to generate a 32-bit address and reloc_signed_4byte, and movabs is required to specify a 64-bit address.

Fixes #73481


Full diff: https://github.com/llvm/llvm-project/pull/172954.diff

6 Files Affected:

  • (modified) llvm/lib/Target/X86/AsmParser/X86Operand.h (+12)
  • (modified) llvm/lib/Target/X86/X86InstrAsmAlias.td (+9)
  • (modified) llvm/lib/Target/X86/X86InstrOperands.td (+7)
  • (added) llvm/test/MC/X86/intel-syntax-movabs-large.s (+69)
  • (added) llvm/test/MC/X86/movabs-large.s (+69)
  • (modified) llvm/test/MC/X86/x86-64.s (-4)
diff --git a/llvm/lib/Target/X86/AsmParser/X86Operand.h b/llvm/lib/Target/X86/AsmParser/X86Operand.h
index a92272573bacd..a31a7c3b4bd0e 100644
--- a/llvm/lib/Target/X86/AsmParser/X86Operand.h
+++ b/llvm/lib/Target/X86/AsmParser/X86Operand.h
@@ -506,6 +506,18 @@ struct X86Operand final : public MCParsedAsmOperand {
     return isMemOffs() && Mem.ModeSize == 64 && (!Mem.Size || Mem.Size == 64);
   }
 
+  // Returns true only for a moffset that requires *more than* 32 bits.
+  bool isMemConstOffs64() const {
+    if (!isMemOffs() || Mem.ModeSize != 64)
+      return false;
+
+    const MCConstantExpr *CE = dyn_cast<MCConstantExpr>(getMemDisp());
+    if (!CE)
+      return false;
+
+    return !isInt<32>(CE->getValue());
+  }
+
   bool isPrefix() const { return Kind == Prefix; }
   bool isReg() const override { return Kind == Register; }
   bool isDXReg() const { return Kind == DXRegister; }
diff --git a/llvm/lib/Target/X86/X86InstrAsmAlias.td b/llvm/lib/Target/X86/X86InstrAsmAlias.td
index 5a4c3f61672b3..1d56512e7a5cd 100644
--- a/llvm/lib/Target/X86/X86InstrAsmAlias.td
+++ b/llvm/lib/Target/X86/X86InstrAsmAlias.td
@@ -667,6 +667,15 @@ def : InstAlias<"jmpl\t$seg, $off",  (FARJMP32i  i32imm:$off, i16imm:$seg)>, Req
 
 // Match 'movq <largeimm>, <reg>' as an alias for movabsq.
 def : InstAlias<"mov{q}\t{$imm, $reg|$reg, $imm}", (MOV64ri GR64:$reg, i64imm:$imm), 0>;
+// Match 'movX <largeimm>, <reg a>' and its reverse as an alias for movabsX.
+def : InstAlias<"mov{b}\t{$src, %al|al, $src}", (MOV8ao64 offset64const:$src), 0>;
+def : InstAlias<"mov{w}\t{$src, %ax|ax, $src}", (MOV16ao64 offset64const:$src), 0>;
+def : InstAlias<"mov{l}\t{$src, %eax|eax, $src}", (MOV32ao64 offset64const:$src), 0>;
+def : InstAlias<"mov{q}\t{$src, %rax|rax, $src}", (MOV64ao64 offset64const:$src), 0>;
+def : InstAlias<"mov{b}\t{%al, $dst|$dst, al}", (MOV8o64a offset64const:$dst), 0>;
+def : InstAlias<"mov{w}\t{%ax, $dst|$dst, ax}", (MOV16o64a offset64const:$dst), 0>;
+def : InstAlias<"mov{l}\t{%eax, $dst|$dst, eax}", (MOV32o64a offset64const:$dst), 0>;
+def : InstAlias<"mov{q}\t{%rax, $dst|$dst, rax}", (MOV64o64a offset64const:$dst), 0>;
 
 // Match 'movd GR64, MMX' as an alias for movq to be compatible with gas,
 // which supports this due to an old AMD documentation bug when 64-bit mode was
diff --git a/llvm/lib/Target/X86/X86InstrOperands.td b/llvm/lib/Target/X86/X86InstrOperands.td
index 6ba07f74d74c5..69ce4f8552609 100644
--- a/llvm/lib/Target/X86/X86InstrOperands.td
+++ b/llvm/lib/Target/X86/X86InstrOperands.td
@@ -280,6 +280,10 @@ let RenderMethod = "addMemOffsOperands" in {
     let Name = "MemOffs64_64";
     let SuperClasses = [X86Mem64AsmOperand];
   }
+  def X86MemConstOffs64_AsmOperand : AsmOperandClass {
+    let Name = "MemConstOffs64";
+    let SuperClasses = [X86Mem8AsmOperand];
+  }
 } // RenderMethod = "addMemOffsOperands"
 
 class X86SrcIdxOperand<string printMethod, AsmOperandClass parserMatchClass>
@@ -330,6 +334,9 @@ def offset64_32 : X86MemOffsOperand<i64imm, "printMemOffs32",
 def offset64_64 : X86MemOffsOperand<i64imm, "printMemOffs64",
                                     X86MemOffs64_64AsmOperand>;
 
+def offset64const  : X86MemOffsOperand<i64imm, "printMemOffset",
+                                       X86MemConstOffs64_AsmOperand>;
+
 def ccode : Operand<i8> {
   let PrintMethod = "printCondCode";
   let OperandNamespace = "X86";
diff --git a/llvm/test/MC/X86/intel-syntax-movabs-large.s b/llvm/test/MC/X86/intel-syntax-movabs-large.s
new file mode 100644
index 0000000000000..eb4353dbaab17
--- /dev/null
+++ b/llvm/test/MC/X86/intel-syntax-movabs-large.s
@@ -0,0 +1,69 @@
+// RUN: llvm-mc -triple x86_64- -x86-asm-syntax=intel --show-encoding %s | FileCheck %s
+
+// These should map mov -> movabs
+
+// CHECK: movabsb %al, 78187493520
+// CHECK: encoding: [0xa2,0x90,0x78,0x56,0x34,0x12,0x00,0x00,0x00]
+	mov [0x1234567890], al
+// CHECK: movabsw %ax, 78187493520
+// CHECK: encoding: [0x66,0xa3,0x90,0x78,0x56,0x34,0x12,0x00,0x00,0x00]
+	mov [0x1234567890], ax
+// CHECK: movabsl %eax, 78187493520
+// CHECK: encoding: [0xa3,0x90,0x78,0x56,0x34,0x12,0x00,0x00,0x00]
+	mov [0x1234567890], eax
+// CHECK: movabsq %rax, 78187493520
+// CHECK: encoding: [0x48,0xa3,0x90,0x78,0x56,0x34,0x12,0x00,0x00,0x00]
+	mov [0x1234567890], rax
+
+// CHECK: movabsb 78187493520, %al
+// CHECK: encoding: [0xa0,0x90,0x78,0x56,0x34,0x12,0x00,0x00,0x00]
+	mov al, [0x1234567890]
+// CHECK: movabsw 78187493520, %ax
+// CHECK: encoding: [0x66,0xa1,0x90,0x78,0x56,0x34,0x12,0x00,0x00,0x00]
+	mov ax, [0x1234567890]
+// CHECK: movabsl 78187493520, %eax
+// CHECK: encoding: [0xa1,0x90,0x78,0x56,0x34,0x12,0x00,0x00,0x00]
+	mov eax, [0x1234567890]
+// CHECK: movabsq 78187493520, %rax
+// CHECK: encoding: [0x48,0xa1,0x90,0x78,0x56,0x34,0x12,0x00,0x00,0x00]
+	mov rax, [0x1234567890]
+
+// These should *NOT* map mov -> movabs
+
+// CHECK: movb %al, 305419896
+// CHECK: encoding: [0x88,0x04,0x25,0x78,0x56,0x34,0x12]
+	mov [0x12345678], al
+// CHECK: movw %ax, 305419896
+// CHECK: encoding: [0x66,0x89,0x04,0x25,0x78,0x56,0x34,0x12]
+	mov [0x12345678], ax
+// CHECK: movl %eax, 305419896
+// CHECK: encoding: [0x89,0x04,0x25,0x78,0x56,0x34,0x12]
+	mov [0x12345678], eax
+// CHECK: movq %rax, 305419896
+// CHECK: encoding: [0x48,0x89,0x04,0x25,0x78,0x56,0x34,0x12]
+	mov [0x12345678], rax
+
+// CHECK: movb 305419896, %al
+// CHECK: encoding: [0x8a,0x04,0x25,0x78,0x56,0x34,0x12]
+	mov al, [0x12345678]
+// CHECK: movw 305419896, %ax
+// CHECK: encoding: [0x66,0x8b,0x04,0x25,0x78,0x56,0x34,0x12]
+	mov ax, [0x12345678]
+// CHECK: movl 305419896, %eax
+// CHECK: encoding: [0x8b,0x04,0x25,0x78,0x56,0x34,0x12]
+	mov eax, [0x12345678]
+// CHECK: movq 305419896, %rax
+// CHECK: encoding: [0x48,0x8b,0x04,0x25,0x78,0x56,0x34,0x12]
+	mov rax, [0x12345678]
+
+// Test sign extension
+
+// CHECK: movb %al, 2147483647
+// CHECK: encoding: [0x88,0x04,0x25,0xff,0xff,0xff,0x7f]
+    mov [0x7fffffff], al
+// CHECK: movabsb %al, 2147483648
+// CHECK: encoding: [0xa2,0x00,0x00,0x00,0x80,0x00,0x00,0x00,0x00]
+    mov [0x80000000], al
+// CHECK: movb %al, -2147483648
+// CHECK: encoding: [0x88,0x04,0x25,0x00,0x00,0x00,0x80]
+    mov [0xffffffff80000000], al
diff --git a/llvm/test/MC/X86/movabs-large.s b/llvm/test/MC/X86/movabs-large.s
new file mode 100644
index 0000000000000..731e66c3516ac
--- /dev/null
+++ b/llvm/test/MC/X86/movabs-large.s
@@ -0,0 +1,69 @@
+// RUN: llvm-mc -triple x86_64- --show-encoding %s | FileCheck %s
+
+// These should map mov -> movabs
+
+// CHECK: movabsb %al, 78187493520
+// CHECK: encoding: [0xa2,0x90,0x78,0x56,0x34,0x12,0x00,0x00,0x00]
+	movb %al, 0x1234567890
+// CHECK: movabsw %ax, 78187493520
+// CHECK: encoding: [0x66,0xa3,0x90,0x78,0x56,0x34,0x12,0x00,0x00,0x00]
+	movw %ax, 0x1234567890
+// CHECK: movabsl %eax, 78187493520
+// CHECK: encoding: [0xa3,0x90,0x78,0x56,0x34,0x12,0x00,0x00,0x00]
+	movl %eax, 0x1234567890
+// CHECK: movabsq %rax, 78187493520
+// CHECK: encoding: [0x48,0xa3,0x90,0x78,0x56,0x34,0x12,0x00,0x00,0x00]
+	movq %rax, 0x1234567890
+
+// CHECK: movabsb 78187493520, %al
+// CHECK: encoding: [0xa0,0x90,0x78,0x56,0x34,0x12,0x00,0x00,0x00]
+	movb 0x1234567890, %al
+// CHECK: movabsw 78187493520, %ax
+// CHECK: encoding: [0x66,0xa1,0x90,0x78,0x56,0x34,0x12,0x00,0x00,0x00]
+	movw 0x1234567890, %ax
+// CHECK: movabsl 78187493520, %eax
+// CHECK: encoding: [0xa1,0x90,0x78,0x56,0x34,0x12,0x00,0x00,0x00]
+	movl 0x1234567890, %eax
+// CHECK: movabsq 78187493520, %rax
+// CHECK: encoding: [0x48,0xa1,0x90,0x78,0x56,0x34,0x12,0x00,0x00,0x00]
+	movq 0x1234567890, %rax
+
+// These should *NOT* map mov -> movabs
+
+// CHECK: movb %al, 305419896
+// CHECK: encoding: [0x88,0x04,0x25,0x78,0x56,0x34,0x12]
+	movb %al, 0x12345678
+// CHECK: movw %ax, 305419896
+// CHECK: encoding: [0x66,0x89,0x04,0x25,0x78,0x56,0x34,0x12]
+	movw %ax, 0x12345678
+// CHECK: movl %eax, 305419896
+// CHECK: encoding: [0x89,0x04,0x25,0x78,0x56,0x34,0x12]
+	movl %eax, 0x12345678
+// CHECK: movq %rax, 305419896
+// CHECK: encoding: [0x48,0x89,0x04,0x25,0x78,0x56,0x34,0x12]
+	movq %rax, 0x12345678
+
+// CHECK: movb 305419896, %al
+// CHECK: encoding: [0x8a,0x04,0x25,0x78,0x56,0x34,0x12]
+	movb 0x12345678, %al
+// CHECK: movw 305419896, %ax
+// CHECK: encoding: [0x66,0x8b,0x04,0x25,0x78,0x56,0x34,0x12]
+	movw 0x12345678, %ax
+// CHECK: movl 305419896, %eax
+// CHECK: encoding: [0x8b,0x04,0x25,0x78,0x56,0x34,0x12]
+	movl 0x12345678, %eax
+// CHECK: movq 305419896, %rax
+// CHECK: encoding: [0x48,0x8b,0x04,0x25,0x78,0x56,0x34,0x12]
+	movq 0x12345678, %rax
+
+// Test sign extension
+
+// CHECK: movb %al, 2147483647
+// CHECK: encoding: [0x88,0x04,0x25,0xff,0xff,0xff,0x7f]
+    movb %al, 0x7fffffff
+// CHECK: movabsb %al, 2147483648
+// CHECK: encoding: [0xa2,0x00,0x00,0x00,0x80,0x00,0x00,0x00,0x00]
+    movb %al, 0x80000000
+// CHECK: movb %al, -2147483648
+// CHECK: encoding: [0x88,0x04,0x25,0x00,0x00,0x00,0x80]
+    movb %al, 0xffffffff80000000
diff --git a/llvm/test/MC/X86/x86-64.s b/llvm/test/MC/X86/x86-64.s
index 911f4674bd2cc..2da72d52d997c 100644
--- a/llvm/test/MC/X86/x86-64.s
+++ b/llvm/test/MC/X86/x86-64.s
@@ -675,10 +675,6 @@ movl	0, %eax   // CHECK: movl 0, %eax # encoding: [0x8b,0x04,0x25,0x00,0x00,0x00
 // CHECK: encoding: [0x48,0xc7,0xc0,0x0a,0x00,0x00,0x00]
         movq $10, %rax
 
-// CHECK: movq 81985529216486895, %rax
-// CHECK: encoding: [0x48,0x8b,0x04,0x25,0xef,0xcd,0xab,0x89]
-        movq 0x123456789abcdef, %rax
-
 // CHECK: movabsb -6066930261531658096, %al
 // CHECK: encoding: [0xa0,0x90,0x78,0x56,0x34,0x12,0xef,0xcd,0xab]
         movabsb 0xabcdef1234567890,%al

Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ArcaneNibble ArcaneNibble merged commit 2cdb886 into llvm:main Dec 19, 2025
12 checks passed
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Dec 19, 2025
Opcodes 0xA0-0xA3 can access a 64-bit absolute address. Before this
change, LLVM would require this to be written as `movabs`, and writing
it as `mov` would silently truncate the address.

After this change, if `mov moffset` is used with a constant expression
which evaluates to a value that doesn't fit in 32 bits, the instruction
will automatically be changed to `movabs`. This should match the
behavior of more recent versions of gas.

The one existing test which expected a silent truncation + sign-extend
is removed.

This change does not affect `mov` opcodes that reference an external
symbol. Using `mov` will continue to generate a 32-bit address and
reloc_signed_4byte, and `movabs` is required to specify a 64-bit
address.

Fixes llvm#73481
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

.intel_syntax should respect mov with 64-bit address

3 participants