[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <384a1d29-13ca-4e4b-b4b7-2a99e3fdb01b@t-8ch.de>
Date: Sat, 10 Aug 2024 18:04:36 +0200
From: Thomas Weißschuh <linux@...ssschuh.net>
To: Willy Tarreau <w@....eu>
Cc: Ammar Faizi <ammarfaizi2@...weeb.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/3] tools/nolibc: x86_64: wrap asm functions in functions
On 2024-08-10 16:35:56+0000, Willy Tarreau wrote:
> On Sat, Aug 10, 2024 at 02:37:19PM +0200, Thomas Weißschuh wrote:
> > On 2024-08-10 19:12:25+0000, Ammar Faizi wrote:
> > > On Sat, Aug 10, 2024 at 12:54:46PM +0200, Thomas Weißschuh wrote:
> > > > +__attribute__((weak,unused,section(".text.nolibc_memmove")))
> > > > +__nolibc_naked __no_stack_protector
> > > > +void *memmove(void *dst __attribute__((unused)),
> > > > + const void *src __attribute__((unused)),
> > > > + size_t len __attribute__((unused)))
> > > > +{
> > > > + __asm__ volatile (
> > > > + "movq %rdx, %rcx\n\t"
> > > > + "movq %rdi, %rax\n\t"
> > > > + "movq %rdi, %rdx\n\t"
> > > > + "subq %rsi, %rdx\n\t"
> > > > + "cmpq %rcx, %rdx\n\t"
> > > > + "jb .Lbackward_copy\n\t"
> > > > + "rep movsb\n\t"
> > > > + "retq\n"
> > > > + ".Lbackward_copy:"
> > > > + "leaq -1(%rdi, %rcx, 1), %rdi\n\t"
> > > > + "leaq -1(%rsi, %rcx, 1), %rsi\n\t"
> > > > + "std\n\t"
> > > > + "rep movsb\n\t"
> > > > + "cld\n\t"
> > > > + "retq\n"
> > > > + );
> > > > + __nolibc_naked_epilogue();
> > > > +}
> > >
> > > NAK for this patch.
> >
> > Thanks for the feedback!
> >
> > (I'm not an assembler programmer, so regard my notes with a grain of salt)
> >
> > > This approach appears highly dangerous, particularly when the compiler
> > > inlines the call. When using inline assembly within a function, it's
> > > crucial to define proper constraints and a clobber list to ensure the
> > > arguments are correctly bound to the inline assembly.
> >
> > Aren't the constraints a feature of Extended Asm?
> > This is a Basic Asm block.
> > Indeed naked functions only support Basic Asm, so there is no way to
> > explicitly bind arguments to their registers.
>
> That's indeed what is said in GNU docs, but a quick test with gcc 9 and 11
> shows me that both accept both naked and extended asm. However clang doesn't
> seem happy about it.
>
> The problem here is dual:
> - first, the naked attribute can be emulated if not supported by the
> compiler and we'll only have optimize(-Os,-fomit-frame-pointer), and
> in this case the compiler does not respect well the registers (I'm
> seeing bad code being emitted in callers if I mark the function
> static for example).
Ack.
Would it help to mark the function as non-inlineable?
> - second, nothing prevents the compiler from inlining that function
> and in this case, as Ammar says, we have no control over what
> registers arguments will be placed into since the main purpose of
> inlining precisely is to build optimal code that limits register
> moves. However, in my tests, it appears that when marked naked,
> the compiler never inlines it and even emits a warning if I try
> to mark it inline, saying it was already noinline. So maybe naked
> implies noinline.
I did not manage yet to get it inlined.
With __attribute__((always_inline)) gcc even gives:
sysroot/x86_64/include/arch.h:214:1: error: ignoring attribute 'always_inline' because it conflicts with attribute 'noinline' [-Werror=attributes]
> In any case, depending on the availability of naked, we have two clearly
> different layouts to deal with :-/
Indeed...
> We could imagine not marking it naked and keeping the optimize(Os...)
> stuff only, then using extended asm like in other functions maybe.
> Otherwise this could require two versions, which is less fun.
Note: __attribute__((naked)) on gcc x86_64 is supported since gcc 8.1.
> > Looking at the object code for various on both gcc and clang show always
> > the same object code.
> > (Although GCC adds a "ud2" after the "ret")
>
> It *seems* so as well for me but in the not-really-naked case, you
> really have zero control over it. Also one thing it does to me in the
> not-naked-case is that registers are not saved by the caller and are
> clobbered by the asm statement:
>
> void *memmove2(void *dst __attribute__((unused)),
> void *src __attribute__((unused)),
> size_t len __attribute__((unused)))
> {
> memmove(dst, src, len);
> memmove(src, dst, len);
> return 0;
> }
>
> Gives me this with the naked attribute:
> 0000000000000025 <memmove2>:
> 25: 55 push %rbp
> 26: 48 89 f5 mov %rsi,%rbp
> 29: 50 push %rax
> 2a: 48 89 14 24 mov %rdx,(%rsp)
> 2e: e8 00 00 00 00 callq 33 <memmove2+0xe>
> 33: 48 8b 14 24 mov (%rsp),%rdx
> 37: 48 89 ef mov %rbp,%rdi
> 3a: 48 89 c6 mov %rax,%rsi
> 3d: e8 00 00 00 00 callq 42 <memmove2+0x1d>
> 42: 31 c0 xor %eax,%eax
> 44: 5a pop %rdx
> 45: 5d pop %rbp
> 46: c3 retq
>
> But the alternate form (optimize(-Os...)):
> 0000000000000024 <memmove2>:
> 24: 49 89 f0 mov %rsi,%r8
> 27: e8 00 00 00 00 callq 2c <memmove2+0x8>
> 2c: 48 89 fe mov %rdi,%rsi
> 2f: 4c 89 c7 mov %r8,%rdi
> 32: e8 00 00 00 00 callq 37 <memmove2+0x13>
> 37: 31 c0 xor %eax,%eax
> 39: c3 retq
>
> See how rdi and rdx were expected to be preserved after the first call
> but were not? This was with gcc-9.5 (which supports naked but it's for
> illustration purposes of the risk of leaving unconstrained asm like this).
To be honest, I don't see it. Not enough asm experience I guess, but
I'll look at it some more.
> I also managed to get clang to complain about the .Lbackward_copy label
> being already defined, but I don't know how I managed to do it. I think
> we should not leave it as a global label like this and instead just use
> the regular numbered labels if the asm is inlined.
Ack, this was easy to fix.
> Also I'm wondering why there are errors about memcpy and memmove being
> already defined with -flto, because these ones are marked "weak". Maybe
> we need to add something else, that gcc emits with the functions when
> using lto ?
I think normally .weak would only be resolved by the linker.
What happens here is that the assembler is already presented with
duplicate definitions which it is not prepared to handle.
(See below for an example)
It works on gcc without -flto and on clang with and without -flto.
It seems like a compiler bug to me. If you agree I'll open a ticket
against GCC.
Then we can fix only the label to make it work on clang and wait for a
fixed GCC.
Example:
$ cat a.c
#include "func.h"
int main(void)
{
return 0;
}
$ cat b.c
#include "func.h"
$ cat func.h
__asm__(
".weak foo\n"
"foo:\n"
"retq\n"
);
$ gcc -flto -save-temps a.c b.c
./a.ltrans0.ltrans.s: Assembler messages:
./a.ltrans0.ltrans.s:28: Error: symbol `foo' is already defined
lto-wrapper: fatal error: gcc returned 1 exit status
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
$ cat ./a.ltrans0.ltrans.s
.file "<artificial>"
.text
#APP
.weak foo
foo:
retq
#NO_APP
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
#APP
.weak foo
foo:
retq
.ident "GCC: (GNU) 14.2.1 20240805"
.section .note.GNU-stack,"",@progbits
Powered by blists - more mailing lists