lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240810143556.GA9168@1wt.eu>
Date: Sat, 10 Aug 2024 16:35:56 +0200
From: Willy Tarreau <w@....eu>
To: Thomas Weißschuh <linux@...ssschuh.net>
Cc: Ammar Faizi <ammarfaizi2@...weeb.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/3] tools/nolibc: x86_64: wrap asm functions in functions

Hi Thomas,

On Sat, Aug 10, 2024 at 02:37:19PM +0200, Thomas Weißschuh wrote:
> On 2024-08-10 19:12:25+0000, Ammar Faizi wrote:
> > On Sat, Aug 10, 2024 at 12:54:46PM +0200, Thomas Weißschuh wrote:
> > > +__attribute__((weak,unused,section(".text.nolibc_memmove")))
> > > +__nolibc_naked __no_stack_protector
> > > +void *memmove(void *dst __attribute__((unused)),
> > > +	      const void *src __attribute__((unused)),
> > > +	      size_t len __attribute__((unused)))
> > > +{
> > > +	__asm__ volatile (
> > > +		"movq %rdx, %rcx\n\t"
> > > +		"movq %rdi, %rax\n\t"
> > > +		"movq %rdi, %rdx\n\t"
> > > +		"subq %rsi, %rdx\n\t"
> > > +		"cmpq %rcx, %rdx\n\t"
> > > +		"jb   .Lbackward_copy\n\t"
> > > +		"rep movsb\n\t"
> > > +		"retq\n"
> > > +		".Lbackward_copy:"
> > > +		"leaq -1(%rdi, %rcx, 1), %rdi\n\t"
> > > +		"leaq -1(%rsi, %rcx, 1), %rsi\n\t"
> > > +		"std\n\t"
> > > +		"rep movsb\n\t"
> > > +		"cld\n\t"
> > > +		"retq\n"
> > > +	);
> > > +	__nolibc_naked_epilogue();
> > > +}
> > 
> > NAK for this patch.
> 
> Thanks for the feedback!
> 
> (I'm not an assembler programmer, so regard my notes with a grain of salt)
> 
> > This approach appears highly dangerous, particularly when the compiler
> > inlines the call. When using inline assembly within a function, it's
> > crucial to define proper constraints and a clobber list to ensure the
> > arguments are correctly bound to the inline assembly.
> 
> Aren't the constraints a feature of Extended Asm?
> This is a Basic Asm block.
> Indeed naked functions only support Basic Asm, so there is no way to
> explicitly bind arguments to their registers.

That's indeed what is said in GNU docs, but a quick test with gcc 9 and 11
shows me that both accept both naked and extended asm. However clang doesn't
seem happy about it.

The problem here is dual:
  - first, the naked attribute can be emulated if not supported by the
    compiler and we'll only have optimize(-Os,-fomit-frame-pointer), and
    in this case the compiler does not respect well the registers (I'm
    seeing bad code being emitted in callers if I mark the function
    static for example).

  - second, nothing prevents the compiler from inlining that function
    and in this case, as Ammar says, we have no control over what
    registers arguments will be placed into since the main purpose of
    inlining precisely is to build optimal code that limits register
    moves. However, in my tests, it appears that when marked naked,
    the compiler never inlines it and even emits a warning if I try
    to mark it inline, saying it was already noinline. So maybe naked
    implies noinline.

In any case, depending on the availability of naked, we have two clearly
different layouts to deal with :-/

We could imagine not marking it naked and keeping the optimize(Os...)
stuff only, then using extended asm like in other functions maybe.
Otherwise this could require two versions, which is less fun.

> Looking at the object code for various on both gcc and clang show always
> the same object code.
> (Although GCC adds a "ud2" after the "ret")

It *seems* so as well for me but in the not-really-naked case, you
really have zero control over it. Also one thing it does to me in the
not-naked-case is that registers are not saved by the caller and are
clobbered by the asm statement:

  void *memmove2(void *dst __attribute__((unused)),
                 void *src __attribute__((unused)),
                 size_t len __attribute__((unused)))
  {
        memmove(dst, src, len);
        memmove(src, dst, len);
        return 0;
  }

Gives me this with the naked attribute:
0000000000000025 <memmove2>:
  25:   55                      push   %rbp
  26:   48 89 f5                mov    %rsi,%rbp
  29:   50                      push   %rax
  2a:   48 89 14 24             mov    %rdx,(%rsp)
  2e:   e8 00 00 00 00          callq  33 <memmove2+0xe>
  33:   48 8b 14 24             mov    (%rsp),%rdx
  37:   48 89 ef                mov    %rbp,%rdi
  3a:   48 89 c6                mov    %rax,%rsi
  3d:   e8 00 00 00 00          callq  42 <memmove2+0x1d>
  42:   31 c0                   xor    %eax,%eax
  44:   5a                      pop    %rdx
  45:   5d                      pop    %rbp
  46:   c3                      retq   

But the alternate form (optimize(-Os...)):
0000000000000024 <memmove2>:
  24:   49 89 f0                mov    %rsi,%r8
  27:   e8 00 00 00 00          callq  2c <memmove2+0x8>
  2c:   48 89 fe                mov    %rdi,%rsi
  2f:   4c 89 c7                mov    %r8,%rdi
  32:   e8 00 00 00 00          callq  37 <memmove2+0x13>
  37:   31 c0                   xor    %eax,%eax
  39:   c3                      retq   

See how rdi and rdx were expected to be preserved after the first call
but were not? This was with gcc-9.5 (which supports naked but it's for
illustration purposes of the risk of leaving unconstrained asm like this).

I also managed to get clang to complain about the .Lbackward_copy label
being already defined, but I don't know how I managed to do it. I think
we should not leave it as a global label like this and instead just use
the regular numbered labels if the asm is inlined.

Also I'm wondering why there are errors about memcpy and memmove being
already defined with -flto, because these ones are marked "weak". Maybe
we need to add something else, that gcc emits with the functions when
using lto ?

Just my two cents,
Willy

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ