[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zre8cPOKetof24nJ@biznet-home.integral.gnuweeb.org>
Date: Sun, 11 Aug 2024 02:16:00 +0700
From: Ammar Faizi <ammarfaizi2@...weeb.org>
To: Thomas Weißschuh <linux@...ssschuh.net>
Cc: Willy Tarreau <w@....eu>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/3] tools/nolibc: x86_64: wrap asm functions in functions
On Sat, Aug 10, 2024 at 06:04:36PM +0200, Thomas Weißschuh wrote:
> On 2024-08-10 16:35:56+0000, Willy Tarreau wrote:
> > See how rdi and rdx were expected to be preserved after the first call
> > but were not? This was with gcc-9.5 (which supports naked but it's for
> > illustration purposes of the risk of leaving unconstrained asm like this).
>
> To be honest, I don't see it. Not enough asm experience I guess, but
> I'll look at it some more.
Let me shed some light on what is going wrong in that form.
First, let's revisit the System V ABI x86-64 calling convention:
- %rdi holds the 1st argument
- %rsi holds the 2nd argument
- %rdx holds the 3rd argument
- %rax is used for the return value
Take a look at the assembly code that Willy sent. The second memmove()
call is receiving incorrect arguments from the previous memmove() call.
The issue arises because your memmove() function doesn't inform the
compiler that it modifies %rdi and %rdx. As far as the compiler is
concerned, your memmove() function:
Does not alter any registers.
Consequently, the compiler assumes that the values in %rdi and %rdx
remain unchanged after the memmove() function returns. With this
assumption, and since memmove() is within the same translation unit as
the caller, the compiler optimizes the register moves without preserving
the values in %rdi and %rdx.
To properly inform the compiler that certain registers are being
modified, you need to use constraints and a clobber list. Here's an
example (note: this is untested, only compiled on Godbolt):
Link: https://godbolt.org/z/WTz1nvn1h
```
void *memmove(void *dst, const void *src, size_t len)
{
void *rax;
__asm__ goto volatile (
"movq %%rdx, %%rcx\n\t"
"movq %%rdi, %%rax\n\t"
"movq %%rdi, %%rdx\n\t"
"subq %%rsi, %%rdx\n\t"
"cmpq %%rcx, %%rdx\n\t"
"jb .Lbackward_copy\n\t"
"rep movsb\n\t"
"jmp %[out]\n\t"
".Lbackward_copy:\n\t"
"leaq -1(%%rdi, %%rcx, 1), %%rdi\n\t"
"leaq -1(%%rsi, %%rcx, 1), %%rsi\n\t"
"std\n\t"
"rep movsb\n\t"
"cld"
: "+D"(dst), "+S"(src), "+c"(len), "=a"(rax)
:
: "rdx", "memory", "cc"
: out
);
out:
return rax;
}
```
This approach helps the compiler correctly handle the register
modifications, ensuring that the generated code behaves as expected.
Notes:
constraints (binding C variables):
- "+D": Read and Write %rdi
- "+S": Read and Write %rsi
- "+c": Read and Write %rcx
- "=a": Write %rax
clobber list:
- "rdx": Write %rdx
- "memory": Read and Write memory (pointed to by dst and src)
- "cc": Write %rflags (This might be optional for x86-64)
I believe we can do better code than that. I can't pull Linux git tree
as well for now. I'll be fully available for work on Tuesday morning
(13 August 2024).
--
Ammar Faizi
Powered by blists - more mailing lists