[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <B4C09939-9216-457E-8C93-052521FFE96F@zytor.com>
Date: Mon, 02 Feb 2026 19:57:42 -0800
From: "H. Peter Anvin" <hpa@...or.com>
To: Jens Remus <jremus@...ux.ibm.com>
CC: "Jason A. Donenfeld" <Jason@...c4.com>,
"Peter Zijlstra (Intel)" <peterz@...radead.org>,
"Theodore Ts'o" <tytso@....edu>,
Thomas Weißschuh <thomas.weissschuh@...utronix.de>,
Xin Li <xin@...or.com>, Andrew Cooper <andrew.cooper3@...rix.com>,
Andy Lutomirski <luto@...nel.org>, Ard Biesheuvel <ardb@...nel.org>,
Borislav Petkov <bp@...en8.de>, Brian Gerst <brgerst@...il.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Ingo Molnar <mingo@...hat.com>, James Morse <james.morse@....com>,
Jarkko Sakkinen <jarkko@...nel.org>,
Josh Poimboeuf <jpoimboe@...nel.org>, Kees Cook <kees@...nel.org>,
Nam Cao <namcao@...utronix.de>, Oleg Nesterov <oleg@...hat.com>,
Perry Yuan <perry.yuan@....com>, Thomas Gleixner <tglx@...utronix.de>,
Thomas Huth <thuth@...hat.com>, Uros Bizjak <ubizjak@...il.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-sgx@...r.kernel.org, x86@...nel.org,
Indu Bhagat <indu.bhagat@...cle.com>,
Claudiu Zissulescu-Ianculescu <claudiu.zissulescu-ianculescu@...cle.com>,
Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>
Subject: Re: [PATCH v4.1 06/10] x86/entry/vdso32: remove open-coded DWARF in sigreturn.S
On February 2, 2026 9:02:48 AM PST, Jens Remus <jremus@...ux.ibm.com> wrote:
>Hello Peter!
>
>On 1/6/2026 10:18 PM, H. Peter Anvin wrote:
>> The vdso32 sigreturn.S contains open-coded DWARF bytecode, which
>> includes a hack for gdb to not try to step back to a previous call
>> instruction when backtracing from a signal handler.
>>
>> Neither of those are necessary anymore: the backtracing issue is
>> handled by ".cfi_entry simple" and ".cfi_signal_frame", both of which
>> have been supported for a very long time now, which allows the
>> remaining frame to be built using regular .cfi annotations.
>
>Hopefully Glibc developers will do something similar for x86-64
>__restore_rt() in Glibc sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c.
>
>>
>> Add a few more register offsets to the signal frame just for good
>> measure.
>>
>> Replace the nop on fallthrough of the system call (which should never,
>> ever happen) with a ud2a trap.
>> diff --git a/arch/x86/entry/vdso/vdso32/sigreturn.S b/arch/x86/entry/vdso/vdso32/sigreturn.S
>
>> .text
>> .globl __kernel_sigreturn
>> .type __kernel_sigreturn,@function
>> - nop /* this guy is needed for .LSTARTFDEDLSI1 below (watch for HACK) */
>> ALIGN
>> __kernel_sigreturn:
>> -.LSTART_sigreturn:
>> - popl %eax /* XXX does this mean it needs unwind info? */
>> + STARTPROC_SIGNAL_FRAME IA32_SIGFRAME_sigcontext
>> + popl %eax
>> + CFI_ADJUST_CFA_OFFSET -4
>> movl $__NR_sigreturn, %eax
>> int $0x80
>> -.LEND_sigreturn:
>
>...
>
>> .globl __kernel_rt_sigreturn
>> .type __kernel_rt_sigreturn,@function
>> ALIGN
>> __kernel_rt_sigreturn:
>> -.LSTART_rt_sigreturn:
>> + STARTPROC_SIGNAL_FRAME IA32_RT_SIGFRAME_sigcontext
>> movl $__NR_rt_sigreturn, %eax
>> int $0x80
>> -.LEND_rt_sigreturn:
>
>...
>
>> - .section .eh_frame,"a",@progbits
>> -.LSTARTFRAMEDLSI1:
>> - .long .LENDCIEDLSI1-.LSTARTCIEDLSI1
>> -.LSTARTCIEDLSI1:
>> - .long 0 /* CIE ID */
>> - .byte 1 /* Version number */
>> - .string "zRS" /* NUL-terminated augmentation string */
>
>Note that the "S" in "zRS" is the signal frame indication.
>
>> - .uleb128 1 /* Code alignment factor */
>> - .sleb128 -4 /* Data alignment factor */
>> - .byte 8 /* Return address register column */
>> - .uleb128 1 /* Augmentation value length */
>> - .byte 0x1b /* DW_EH_PE_pcrel|DW_EH_PE_sdata4. */
>> - .byte 0 /* DW_CFA_nop */
>> - .align 4
>> -.LENDCIEDLSI1:
>> - .long .LENDFDEDLSI1-.LSTARTFDEDLSI1 /* Length FDE */
>> -.LSTARTFDEDLSI1:
>> - .long .LSTARTFDEDLSI1-.LSTARTFRAMEDLSI1 /* CIE pointer */
>> - /* HACK: The dwarf2 unwind routines will subtract 1 from the
>> - return address to get an address in the middle of the
>> - presumed call instruction. Since we didn't get here via
>> - a call, we need to include the nop before the real start
>> - to make up for it. */
>> - .long .LSTART_sigreturn-1-. /* PC-relative start address */
>
>Your version does no longer have this nop nor does the FDE start one
>byte earlier. Isn't that required for unwinders any longer?
>See excerpt from dumped DWARF and disassembly for __kernel_sigreturn()
>below.
>
>> - .long .LEND_sigreturn-.LSTART_sigreturn+1
>> - .uleb128 0 /* Augmentation */
>...
>
>> - .align 4
>> -.LENDFDEDLSI1:
>> -
>> - .long .LENDFDEDLSI2-.LSTARTFDEDLSI2 /* Length FDE */
>> -.LSTARTFDEDLSI2:
>> - .long .LSTARTFDEDLSI2-.LSTARTFRAMEDLSI1 /* CIE pointer */
>> - /* HACK: See above wrt unwind library assumptions. */
>> - .long .LSTART_rt_sigreturn-1-. /* PC-relative start address */
>
>Ditto.
>
>> - .long .LEND_rt_sigreturn-.LSTART_rt_sigreturn+1
>> - .uleb128 0 /* Augmentation */
>
>Excerpt from dump of DWARF and disassembly with your patch:
>
>$ objdump -d -Wf arch/x86/entry/vdso/vdso32/vdso32.so.dbg
>...
>000001cc 0000003c 00000000 CIE <-- CIE for __kernel_sigreturn
> Version: 1
> Augmentation: "zRS"
> Code alignment factor: 1
> Data alignment factor: -4
> Return address column: 8
> Augmentation data: 1b
> DW_CFA_def_cfa: r4 (esp) ofs 4
> DW_CFA_offset_extended_sf: r8 (eip) at cfa+56
> DW_CFA_offset_extended_sf: r0 (eax) at cfa+44
> DW_CFA_offset_extended_sf: r3 (ebx) at cfa+32
> DW_CFA_offset_extended_sf: r1 (ecx) at cfa+40
> DW_CFA_offset_extended_sf: r2 (edx) at cfa+36
> DW_CFA_offset_extended_sf: r4 (esp) at cfa+28
> DW_CFA_offset_extended_sf: r5 (ebp) at cfa+24
> DW_CFA_offset_extended_sf: r6 (esi) at cfa+20
> DW_CFA_offset_extended_sf: r7 (edi) at cfa+16
> DW_CFA_offset_extended_sf: r40 (es) at cfa+8
> DW_CFA_offset_extended_sf: r41 (cs) at cfa+60
> DW_CFA_offset_extended_sf: r42 (ss) at cfa+72
> DW_CFA_offset_extended_sf: r43 (ds) at cfa+12
> DW_CFA_offset_extended_sf: r9 (eflags) at cfa+64
> DW_CFA_nop
>
>0000020c 00000010 00000044 FDE cie=000001cc pc=00001a40..00001a4a <-- FDE for __kernel_sigreturn
> DW_CFA_advance_loc: 1 to 00001a41
> DW_CFA_def_cfa_offset: 0
>
>[ The FDE covers the range [1a40..1a4a[. Previously it would have
>started one byte earlier (at the nop), so that the range would have
>been [1a3f..1a4a[. This is/was required for unwinders that always
>subtract one from the unwound return address, so that it points into
>the instruction that invoked the function (e.g. call) instead of behind
>it, in case it was invoked by a non-returning function. Such an
>unwinder would now lookup IP=1a3f as belonging to int80_landing_pad (and
>use the DWARF rules applicable to its last instruction) instead of
>__kernel_sigreturn (and its rules). Likewise for __kernel_rt_sigreturn. ]
>
>...
>00001a3c <int80_landing_pad>:
> 1a3c: 5d pop %ebp
> 1a3d: 5a pop %edx
> 1a3e: 59 pop %ecx
> 1a3f: c3 ret
>
>00001a40 <__kernel_sigreturn>:
> 1a40: 58 pop %eax
> 1a41: b8 77 00 00 00 mov $0x77,%eax
> 1a46: cd 80 int $0x80
>
>00001a48 <vdso32_sigreturn_landing_pad>:
> 1a48: 0f 0b ud2
> 1a4a: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
>
>
>Excerpt without your patch:
>
>$ objdump -d -Wf arch/x86/entry/vdso/vdso32/vdso32.so.dbg
>...
>000001cc 00000010 00000000 CIE <-- CIE for __kernel_sigreturn and __kernel_rt_sigreturn
> Version: 1
> Augmentation: "zRS"
> Code alignment factor: 1
> Data alignment factor: -4
> Return address column: 8
> Augmentation data: 1b
> DW_CFA_nop
> DW_CFA_nop
>
>000001e0 00000068 00000018 FDE cie=000001cc pc=00001a6f..00001a78 <-- FDE for __kernel_sigreturn
> DW_CFA_def_cfa_expression (DW_OP_breg4 (esp): 32; DW_OP_deref)
> DW_CFA_expression: r0 (eax) (DW_OP_breg4 (esp): 48)
> DW_CFA_expression: r1 (ecx) (DW_OP_breg4 (esp): 44)
> DW_CFA_expression: r2 (edx) (DW_OP_breg4 (esp): 40)
> DW_CFA_expression: r3 (ebx) (DW_OP_breg4 (esp): 36)
> DW_CFA_expression: r5 (ebp) (DW_OP_breg4 (esp): 28)
> DW_CFA_expression: r6 (esi) (DW_OP_breg4 (esp): 24)
> DW_CFA_expression: r7 (edi) (DW_OP_breg4 (esp): 20)
> DW_CFA_expression: r8 (eip) (DW_OP_breg4 (esp): 60)
> DW_CFA_advance_loc: 2 to 00001a71
> DW_CFA_def_cfa_expression (DW_OP_breg4 (esp): 28; DW_OP_deref)
> DW_CFA_expression: r0 (eax) (DW_OP_breg4 (esp): 44)
> DW_CFA_expression: r1 (ecx) (DW_OP_breg4 (esp): 40)
> DW_CFA_expression: r2 (edx) (DW_OP_breg4 (esp): 36)
> DW_CFA_expression: r3 (ebx) (DW_OP_breg4 (esp): 32)
> DW_CFA_expression: r5 (ebp) (DW_OP_breg4 (esp): 24)
> DW_CFA_expression: r6 (esi) (DW_OP_breg4 (esp): 20)
> DW_CFA_expression: r7 (edi) (DW_OP_breg4 (esp): 16)
> DW_CFA_expression: r8 (eip) (DW_OP_breg4 (esp): 56)
>
>
>[ See how the FDE for __kernel_sigreturn covers the range [1a6f..1a78[.
>An unwinder that always subtracts one from the return address would
>lookup IP=1a6f as belonging to __kernel_sigreturn (and use the DWARF
>rules applicable to the nop preceeding its symbol). Likewise for
>__kernel_rt_sigreturn. Or is that no longer true? ]
>
>...
>00001a5c <int80_landing_pad>:
> 1a5c: 5d pop %ebp
> 1a5d: 5a pop %edx
> 1a5e: 59 pop %ecx
> 1a5f: c3 ret
> 1a60: 90 nop
> 1a61: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
> 1a68: 2e 8d b4 26 00 00 00 lea %cs:0x0(%esi,%eiz,1),%esi
> 1a6f: 00
>
>00001a70 <__kernel_sigreturn>:
> 1a70: 58 pop %eax
> 1a71: b8 77 00 00 00 mov $0x77,%eax
> 1a76: cd 80 int $0x80
>
>Thanks and regards,
>Jens
That hack dates back from before the signal frame extension. It is no longer necessary.
Powered by blists - more mailing lists