[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5d1a9dff-6319-14a6-ad81-97350a6849af@kernel.org>
Date: Fri, 7 Jan 2022 16:03:46 -0800
From: Andy Lutomirski <luto@...nel.org>
To: Ammar Faizi <ammarfaizi2@...weeb.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"H. Peter Anvin" <hpa@...or.com>
Cc: x86-ml <x86@...nel.org>, lkml <linux-kernel@...r.kernel.org>,
GNU/Weeb Mailing List <gwml@...weeb.org>,
Michael Matz <matz@...e.de>, "H.J. Lu" <hjl.tools@...il.com>,
Willy Tarreau <w@....eu>
Subject: Re: [PATCH v1 2/3] x86/entry/64: Add info about registers on exit
On 1/7/22 15:52, Ammar Faizi wrote:
> There was a controversial discussion about the wording in the System
> V ABI document regarding what registers the kernel is allowed to
> clobber when the userspace executes syscall.
>
> The resolution of the discussion was reviewing the clobber list in
> the glibc source. For a historical reason in the glibc source, the
> kernel must restore all registers before returning to the userspace
> (except for rax, rcx and r11).
>
> Link: https://lore.kernel.org/lkml/alpine.LSU.2.20.2110131601000.26294@wotan.suse.de/
> Link: https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/25
>
> This adds info about registers on exit.
>
> Cc: Andy Lutomirski <luto@...nel.org>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: Borislav Petkov <bp@...en8.de>
> Cc: Dave Hansen <dave.hansen@...ux.intel.com>
> Cc: "H. Peter Anvin" <hpa@...or.com>
> Cc: Michael Matz <matz@...e.de>
> Cc: "H.J. Lu" <hjl.tools@...il.com>
> Cc: Willy Tarreau <w@....eu>
> Cc: x86-ml <x86@...nel.org>
> Cc: lkml <linux-kernel@...r.kernel.org>
> Cc: GNU/Weeb Mailing List <gwml@...weeb.org>
> Signed-off-by: Ammar Faizi <ammarfaizi2@...weeb.org>
> ---
>
> Quoted the full comment in that file after patched, so it's easier to
> review:
> /*
> * 64-bit SYSCALL instruction entry. Up to 6 arguments in registers.
> *
> * This is the only entry point used for 64-bit system calls. The
> * hardware interface is reasonably well designed and the register to
> * argument mapping Linux uses fits well with the registers that are
> * available when SYSCALL is used.
> *
> * SYSCALL instructions can be found inlined in libc implementations as
> * well as some other programs and libraries. There are also a handful
> * of SYSCALL instructions in the vDSO used, for example, as a
> * clock_gettimeofday fallback.
> *
> * 64-bit SYSCALL saves rip to rcx, clears rflags.RF, then saves rflags to r11,
> * then loads new ss, cs, and rip from previously programmed MSRs.
> * rflags gets masked by a value from another MSR (so CLD and CLAC
> * are not needed). SYSCALL does not save anything on the stack
> * and does not change rsp.
> *
> * Registers on entry:
> * rax system call number
> * rcx return address
> * r11 saved rflags (note: r11 is callee-clobbered register in C ABI)
> * rdi arg0
> * rsi arg1
> * rdx arg2
> * r10 arg3 (needs to be moved to rcx to conform to C ABI)
> * r8 arg4
> * r9 arg5
> * (note: r12-r15, rbp, rbx are callee-preserved in C ABI)
> *
> * Only called from user space.
> *
> * Registers on exit:
> * rax syscall return value
> * rcx return address
> * r11 rflags
> *
> * For a historical reason in the glibc source, the kernel must restore all
> * registers except the rax (syscall return value) before returning to the
> * userspace.
> *
> * In other words, with respect to the userspace, when the kernel returns
> * to the userspace, only 3 registers are clobbered, they are rax, rcx,
> * and r11.
> *
> * When user can change pt_regs->foo always force IRET. That is because
> * it deals with uncanonical addresses better. SYSRET has trouble
> * with them due to bugs in both AMD and Intel CPUs.
> */
>
> ---
>
> arch/x86/entry/entry_64.S | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index e432dd075291..1111fff2e05f 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -79,6 +79,19 @@
> *
> * Only called from user space.
> *
> + * Registers on exit:
> + * rax syscall return value
> + * rcx return address
> + * r11 rflags
> + *
> + * For a historical reason in the glibc source, the kernel must restore all
> + * registers except the rax (syscall return value) before returning to the
> + * userspace.
> + *
> + * In other words, with respect to the userspace, when the kernel returns
> + * to the userspace, only 3 registers are clobbered, they are rax, rcx,
> + * and r11.
> + *
I would say this much more concisely:
The Linux kernel preserves all registers (even C callee-clobbered
registers) except for rax, rcx and r11 across system calls, and existing
user code relies on this behavior.
> * When user can change pt_regs->foo always force IRET. That is because
> * it deals with uncanonical addresses better. SYSRET has trouble
> * with them due to bugs in both AMD and Intel CPUs.
>
Powered by blists - more mailing lists