lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 13 Oct 2021 06:02:04 +0700
From:   Ammar Faizi <ammar.faizi@...dents.amikom.ac.id>
To:     David Laight <David.Laight@...LAB.COM>
Cc:     Willy Tarreau <w@....eu>, Paul Walmsley <paul.walmsley@...ive.com>,
        Palmer Dabbelt <palmer@...belt.com>,
        Albert Ou <aou@...s.berkeley.edu>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Andy Lutomirski <luto@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>
Subject: Subject: RE: [PATCH] tools/nolibc: x86: Remove `r8`, `r9` and `r10` from the clobber list

On Wed, Oct 13, 2021 at 4:21 AM David Laight <David.Laight@...lab.com> wrote:
>
> From: Willy Tarreau
> > Sent: 12 October 2021 10:07
> >
> > On Tue, Oct 12, 2021 at 03:36:44PM +0700, Ammar Faizi wrote:
> > > I have tried to search for the documentation about this one, but I
> > > couldn't find any. Checking at `Documentation/x86/entry_64.rst`, but
> > > it doesn't tell anything relevant.
> > (...)
> >
> > OK thanks for the detailed story, thus I didn't miss any obvious
> > reference.
> >
> > > My stance comes from SO, Telegram group discussion, and reading source
> > > code. Therefore, I don't think I can attach the link to it as
> > > "authoritative information". Or can I?
> >
> > You're right, that's not exactly what we can call authoritative :-)
>
> Given the cost of a system call the code benefit from telling
> gcc that r8 to r10 are preserved is likely to be noise.
> Especially since most syscalls are made from C library stubs
> so the application calling code will assume they are trashed.
>
> There may even be a bigger gain from the syscall exit code just
> setting the registers to zero (instead of restoring them).

Setting those registers to zero on "syscall_return_via_sysret" would
need to edit entry_64.S and that apparently breaks the userspace and
results in an ABI change.

>
> There are probably even bigger gains from zeroing the AVX
> registers (which, IIRC, are all caller-saved) somewhere
> between syscall entry and the process sleeping.
> (This can't be done for non-syscall kernel entry.)
>

I copy and paste my message just to clarify the misunderstanding here. We
don't intend to change the ABI, so we can only strive for gaining more
profit to optimize what we can do based on the current situation.

I know for a fact that every "syscall" in the libc is wrapped with a
function call.

However, that is not the case for nolibc.h, because we have a potential
to inline the "syscall" instruction (0f 05) to the user functions.

All syscalls in the nolibc.h are written as a static function with inline
ASM and are likely always inline if we use optimization flag, so this is
a profit not to have r8, r9 and r10 in the clobber list (currently we 
have them).

FWIIW, I created an example where this matters.

```
#include "tools/include/nolibc/nolibc.h"

#define read_abc(a, b, c) __asm__ volatile(""::"r"(a),"r"(b),"r"(c))

int main(void)
{
    int a = 0xaa;
    int b = 0xbb;
    int c = 0xcc;

    read_abc(a, b, c);
    write(1, "test\n", 5);
    read_abc(a, b, c);

    return 0;
}
```

Compile with:
    gcc -Os test.c -o test -nostdlib


With r8, r9, r10 in the clobber list, results in this:

0000000000001000 <main>:
    1000:	f3 0f 1e fa          	endbr64 
    1004:	41 54                	push   %r12
    1006:	41 bc cc 00 00 00    	mov    $0xcc,%r12d
    100c:	55                   	push   %rbp
    100d:	bd bb 00 00 00       	mov    $0xbb,%ebp
    1012:	53                   	push   %rbx
    1013:	bb aa 00 00 00       	mov    $0xaa,%ebx
    1018:	b8 01 00 00 00       	mov    $0x1,%eax
    101d:	bf 01 00 00 00       	mov    $0x1,%edi
    1022:	ba 05 00 00 00       	mov    $0x5,%edx
    1027:	48 8d 35 d2 0f 00 00 	lea    0xfd2(%rip),%rsi
    102e:	0f 05                	syscall 
    1030:	31 c0                	xor    %eax,%eax
    1032:	5b                   	pop    %rbx
    1033:	5d                   	pop    %rbp
    1034:	41 5c                	pop    %r12
    1036:	c3                   	ret 

GCC thinks that syscall will clobber r8, r9, r10. So it spills 0xaa,
0xbb and 0xcc to callee saved registers (r12, rbp and rbx). This is 
clearly extra memory access and extra stack size for preserving them.

But syscall does not actually clobber them, so this is a missed
optimization.

Now without r8, r9, r10 in the clobber list, results in better ASM code:

0000000000001000 <main>:
    1000:   f3 0f 1e fa             endbr64 
    1004:   41 b8 aa 00 00 00       mov    $0xaa,%r8d
    100a:   41 b9 bb 00 00 00       mov    $0xbb,%r9d
    1010:   41 ba cc 00 00 00       mov    $0xcc,%r10d
    1016:   b8 01 00 00 00          mov    $0x1,%eax
    101b:   bf 01 00 00 00          mov    $0x1,%edi
    1020:   ba 05 00 00 00          mov    $0x5,%edx
    1025:   48 8d 35 d4 0f 00 00    lea    0xfd4(%rip),%rsi
    102c:   0f 05                   syscall 
    102e:   31 c0                   xor    %eax,%eax
    1030:   c3                      ret  

Does that make sense?

-- 
Ammar Faizi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ