lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 7 Jan 2022 16:03:46 -0800 From: Andy Lutomirski <luto@...nel.org> To: Ammar Faizi <ammarfaizi2@...weeb.org>, Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com> Cc: x86-ml <x86@...nel.org>, lkml <linux-kernel@...r.kernel.org>, GNU/Weeb Mailing List <gwml@...weeb.org>, Michael Matz <matz@...e.de>, "H.J. Lu" <hjl.tools@...il.com>, Willy Tarreau <w@....eu> Subject: Re: [PATCH v1 2/3] x86/entry/64: Add info about registers on exit On 1/7/22 15:52, Ammar Faizi wrote: > There was a controversial discussion about the wording in the System > V ABI document regarding what registers the kernel is allowed to > clobber when the userspace executes syscall. > > The resolution of the discussion was reviewing the clobber list in > the glibc source. For a historical reason in the glibc source, the > kernel must restore all registers before returning to the userspace > (except for rax, rcx and r11). > > Link: https://lore.kernel.org/lkml/alpine.LSU.2.20.2110131601000.26294@wotan.suse.de/ > Link: https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/25 > > This adds info about registers on exit. > > Cc: Andy Lutomirski <luto@...nel.org> > Cc: Thomas Gleixner <tglx@...utronix.de> > Cc: Ingo Molnar <mingo@...hat.com> > Cc: Borislav Petkov <bp@...en8.de> > Cc: Dave Hansen <dave.hansen@...ux.intel.com> > Cc: "H. Peter Anvin" <hpa@...or.com> > Cc: Michael Matz <matz@...e.de> > Cc: "H.J. Lu" <hjl.tools@...il.com> > Cc: Willy Tarreau <w@....eu> > Cc: x86-ml <x86@...nel.org> > Cc: lkml <linux-kernel@...r.kernel.org> > Cc: GNU/Weeb Mailing List <gwml@...weeb.org> > Signed-off-by: Ammar Faizi <ammarfaizi2@...weeb.org> > --- > > Quoted the full comment in that file after patched, so it's easier to > review: > /* > * 64-bit SYSCALL instruction entry. Up to 6 arguments in registers. > * > * This is the only entry point used for 64-bit system calls. The > * hardware interface is reasonably well designed and the register to > * argument mapping Linux uses fits well with the registers that are > * available when SYSCALL is used. > * > * SYSCALL instructions can be found inlined in libc implementations as > * well as some other programs and libraries. There are also a handful > * of SYSCALL instructions in the vDSO used, for example, as a > * clock_gettimeofday fallback. > * > * 64-bit SYSCALL saves rip to rcx, clears rflags.RF, then saves rflags to r11, > * then loads new ss, cs, and rip from previously programmed MSRs. > * rflags gets masked by a value from another MSR (so CLD and CLAC > * are not needed). SYSCALL does not save anything on the stack > * and does not change rsp. > * > * Registers on entry: > * rax system call number > * rcx return address > * r11 saved rflags (note: r11 is callee-clobbered register in C ABI) > * rdi arg0 > * rsi arg1 > * rdx arg2 > * r10 arg3 (needs to be moved to rcx to conform to C ABI) > * r8 arg4 > * r9 arg5 > * (note: r12-r15, rbp, rbx are callee-preserved in C ABI) > * > * Only called from user space. > * > * Registers on exit: > * rax syscall return value > * rcx return address > * r11 rflags > * > * For a historical reason in the glibc source, the kernel must restore all > * registers except the rax (syscall return value) before returning to the > * userspace. > * > * In other words, with respect to the userspace, when the kernel returns > * to the userspace, only 3 registers are clobbered, they are rax, rcx, > * and r11. > * > * When user can change pt_regs->foo always force IRET. That is because > * it deals with uncanonical addresses better. SYSRET has trouble > * with them due to bugs in both AMD and Intel CPUs. > */ > > --- > > arch/x86/entry/entry_64.S | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S > index e432dd075291..1111fff2e05f 100644 > --- a/arch/x86/entry/entry_64.S > +++ b/arch/x86/entry/entry_64.S > @@ -79,6 +79,19 @@ > * > * Only called from user space. > * > + * Registers on exit: > + * rax syscall return value > + * rcx return address > + * r11 rflags > + * > + * For a historical reason in the glibc source, the kernel must restore all > + * registers except the rax (syscall return value) before returning to the > + * userspace. > + * > + * In other words, with respect to the userspace, when the kernel returns > + * to the userspace, only 3 registers are clobbered, they are rax, rcx, > + * and r11. > + * I would say this much more concisely: The Linux kernel preserves all registers (even C callee-clobbered registers) except for rax, rcx and r11 across system calls, and existing user code relies on this behavior. > * When user can change pt_regs->foo always force IRET. That is because > * it deals with uncanonical addresses better. SYSRET has trouble > * with them due to bugs in both AMD and Intel CPUs. >
Powered by blists - more mailing lists