lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51e0a92e-f976-2a3c-b583-cc7696e711bf@rasmusvillemoes.dk>
Date:   Fri, 13 Sep 2019 10:51:46 +0200
From:   Rasmus Villemoes <linux@...musvillemoes.dk>
To:     Borislav Petkov <bp@...en8.de>, x86-ml <x86@...nel.org>
Cc:     Andy Lutomirski <luto@...nel.org>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        lkml <linux-kernel@...r.kernel.org>
Subject: Re: [RFC] Improve memset

On 13/09/2019 09.22, Borislav Petkov wrote:
> 
> Instead of calling memset:
> 
> ffffffff8100cd8d:       e8 0e 15 7a 00          callq  ffffffff817ae2a0 <__memset>
> 
> and having a JMP inside it depending on the feature supported, let's simply
> have the REP; STOSB directly in the code:
> 
> ...
> ffffffff81000442:       4c 89 d7                mov    %r10,%rdi
> ffffffff81000445:       b9 00 10 00 00          mov    $0x1000,%ecx
> 
> <---- new memset
> ffffffff8100044a:       f3 aa                   rep stos %al,%es:(%rdi)
> ffffffff8100044c:       90                      nop
> ffffffff8100044d:       90                      nop
> ffffffff8100044e:       90                      nop
> <----
> 

> The result is this:
> 
> static __always_inline void *memset(void *dest, int c, size_t n)
> {
>         void *ret, *dummy;

How is this going to affect cases like memset(p, 0, 4/8/16); where gcc
would normally just do one or two word stores? Is rep; stosb still
competitive in that case? It seems that gcc doesn't recognize memset as
a builtin with this always_inline definition present [1].

>         asm volatile(ALTERNATIVE_2_REVERSE("rep; stosb",
>                                            "call memset_rep",  X86_FEATURE_ERMS,
>                                            "call memset_orig", X86_FEATURE_REP_GOOD)
>                 : "=&D" (ret), "=a" (dummy)
>                 : "0" (dest), "a" (c), "c" (n)
>                 /* clobbers used by memset_orig() and memset_rep_good() */
>                 : "rsi", "rdx", "r8", "r9", "memory");
> 
>         return dest;
> }
> 

Also, am I reading this

>  #include <asm/export.h>
>  
> -.weak memset
> -
>  /*
>   */
> -ENTRY(memset)
> -ENTRY(__memset)

right so there's no longer an actual memset symbol in vmlinux? It seems
that despite the above always_inline definition of memset, gcc can still
emit calls to that to implement large initalizations. (The gcc docs are
actually explicit about that, even under -ffreestanding, "GCC requires
the freestanding environment provide 'memcpy', 'memmove', 'memset' and
'memcmp'.")

[1] I tried this silly stripped-down version with gcc-8:

//#include <string.h>
#include <stddef.h>

#if 1
#define always_inline __inline__ __attribute__((__always_inline__))
static always_inline void *memset(void *dest, int c, size_t n)
{
        void *ret, *dummy;

        asm volatile("rep; stosb"
                : "=&D" (ret), "=a" (dummy)
                : "0" (dest), "a" (c), "c" (n)
                /* clobbers used by memset_orig() and memset_rep_good() */
                : "rsi", "rdx", "r8", "r9", "memory");

        return dest;
}
#endif

struct s { long x; long y; };
int h(struct s *s);
int f(void)
{
	struct s s;
	memset(&s, 0, sizeof(s));
	return h(&s);
}

int g(void)
{
	struct s s[1024] = {};
	return h(s);
}

With or without the string.h include, the result was

0000000000000000 <f>:
   0:   48 83 ec 18             sub    $0x18,%rsp
   4:   31 c0                   xor    %eax,%eax
   6:   b9 10 00 00 00          mov    $0x10,%ecx
   b:   49 89 e2                mov    %rsp,%r10
   e:   4c 89 d7                mov    %r10,%rdi
  11:   f3 aa                   rep stos %al,%es:(%rdi)
  13:   4c 89 d7                mov    %r10,%rdi
  16:   e8 00 00 00 00          callq  1b <f+0x1b>
                        17: R_X86_64_PLT32      h-0x4
  1b:   48 83 c4 18             add    $0x18,%rsp
  1f:   c3                      retq

0000000000000020 <g>:
  20:   48 81 ec 08 40 00 00    sub    $0x4008,%rsp
  27:   ba 00 40 00 00          mov    $0x4000,%edx
  2c:   31 f6                   xor    %esi,%esi
  2e:   48 89 e1                mov    %rsp,%rcx
  31:   48 89 cf                mov    %rcx,%rdi
  34:   e8 00 00 00 00          callq  39 <g+0x19>
                        35: R_X86_64_PLT32      memset-0x4
  39:   48 89 c7                mov    %rax,%rdi
  3c:   e8 00 00 00 00          callq  41 <g+0x21>
                        3d: R_X86_64_PLT32      h-0x4
  41:   48 81 c4 08 40 00 00    add    $0x4008,%rsp
  48:   c3                      retq


With the asm version #if 0'ed out, f() uses two movq (and yields a
warning if the string.h include is omitted, but it still recognizes
memset()).

Rasmus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ