linux-kernel - Re: C aggregate passing (Rust kernel policy)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250222063730.GB11482@1wt.eu>
Date: Sat, 22 Feb 2025 07:37:30 +0100
From: Willy Tarreau <w@....eu>
To: David Laight <david.laight.linux@...il.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
        Jan Engelhardt <ej@...i.de>, "H. Peter Anvin" <hpa@...or.com>,
        Greg KH <gregkh@...uxfoundation.org>,
        Boqun Feng <boqun.feng@...il.com>,
        Miguel Ojeda <miguel.ojeda.sandonis@...il.com>,
        Christoph Hellwig <hch@...radead.org>,
        rust-for-linux <rust-for-linux@...r.kernel.org>,
        David Airlie <airlied@...il.com>, linux-kernel@...r.kernel.org,
        ksummit@...ts.linux.dev
Subject: Re: C aggregate passing (Rust kernel policy)

On Sat, Feb 22, 2025 at 07:32:10AM +0100, Willy Tarreau wrote:
> On Fri, Feb 21, 2025 at 09:45:01PM +0000, David Laight wrote:
> > On Fri, 21 Feb 2025 11:12:27 -0800
> > Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> > 
> > > On Fri, 21 Feb 2025 at 10:34, David Laight <david.laight.linux@...il.com> wrote:
> > > >
> > > > As Linus said, most modern ABI pass short structures in one or two registers
> > > > (or stack slots).
> > > > But aggregate returns are always done by passing a hidden pointer argument.
> > > >
> > > > It is annoying that double-sized integers (u64 on 32bit and u128 on 64bit)
> > > > are returned in a register pair - but similar sized structures have to be
> > > > returned by value.  
> > > 
> > > No, they really don't. At least not on x86 and arm64 with our ABI.
> > > Two-register structures get returned in registers too.
> > > 
> > > Try something like this:
> > > 
> > >   struct a {
> > >         unsigned long val1, val2;
> > >   } function(void)
> > >   { return (struct a) { 5, 100 }; }
> > > 
> > > and you'll see both gcc and clang generate
> > > 
> > >         movl $5, %eax
> > >         movl $100, %edx
> > >         retq
> > > 
> > > (and you'll similar code on other architectures).
> > 
> > Humbug, I'm sure it didn't do that the last time I tried it.
> 
> You have not dreamed, most likely last time you tried it was on
> a 32-bit arch like i386 or ARM. Gcc doesn't do that there, most
> likely due to historic reasons that couldn't be changed later,
> it passes a pointer argument to write the data there:
> 
>   00000000 <fct>:
>      0:   8b 44 24 04             mov    0x4(%esp),%eax
>      4:   c7 00 05 00 00 00       movl   $0x5,(%eax)
>      a:   c7 40 04 64 00 00 00    movl   $0x64,0x4(%eax)
>     11:   c2 04 00                ret    $0x4
> 
> You can improve it slightly with -mregparm but that's all,
> and I never found an option nor attribute to change that:
> 
>   00000000 <fct>:
>      0:   c7 00 05 00 00 00       movl   $0x5,(%eax)
>      6:   c7 40 04 64 00 00 00    movl   $0x64,0x4(%eax)
>      d:   c3                      ret
> 
> ARM does the same on 32 bits:
> 
>   00000000 <fct>:
>      0:   2105            movs    r1, #5
>      2:   2264            movs    r2, #100        ; 0x64
>      4:   e9c0 1200       strd    r1, r2, [r0]
>      8:   4770            bx      lr
> 
> I think it's simply that this practice arrived long after these old
> architectures were fairly common and it was too late to change their
> ABI. But x86_64 and aarch64 had the opportunity to benefit from this.
> For example, gcc-3.4 on x86_64 already does the right thing:
> 
>   0000000000000000 <fct>:
>      0:   ba 64 00 00 00          mov    $0x64,%edx
>      5:   b8 05 00 00 00          mov    $0x5,%eax
>      a:   c3                      retq
>   
> So does aarch64 since the oldest gcc I have that supports it (linaro 4.7):
> 
>   0000000000000000 <fct>:
>      0:   d28000a0        mov     x0, #0x5                        // #5
>      4:   d2800c81        mov     x1, #0x64                       // #100
>      8:   d65f03c0        ret
> 
> For my use cases I consider that older architectures are not favored but
> they are not degraded either, while newer ones do significantly benefit
> from the approach, that's why I'm using it extensively.
> 
> Quite frankly, there's no reason to avoid using this for pairs of pointers
> or (status,value) pairs or coordinates etc. And if you absolutely need to
> also support 32-bit archs optimally, you can do it using a macro to turn
> your structs to a larger register and back:
> 
>   struct a {
>           unsigned long v1, v2;
>   };
> 
>   #define MKPAIR(x) (((unsigned long long)(x.v1) << 32) | (x.v2))
>   #define GETPAIR(x) ({ unsigned long long _x = x; (struct a){ .v1 = (_x >> 32), .v2 = (_x)}; })
> 
>   unsigned long long fct(void)
>   {
>           struct a a = { 5, 100 };
>           return MKPAIR(a);
>   }
> 
>   long caller(void)
>   {
>           struct a a = GETPAIR(fct());
>           return a.v1 + a.v2;
>   }
> 
>   00000000 <fct>:
>      0:   b8 64 00 00 00          mov    $0x64,%eax
>      5:   ba 05 00 00 00          mov    $0x5,%edx
>      a:   c3                      ret
> 
>   0000000b <caller>:
>      b:   b8 69 00 00 00          mov    $0x69,%eax
>     10:   c3                      ret
> 
> But quite frankly due to their relevance these days I don't think it's
> worth the effort.

Update: I found in my code a comment suggesting that it works when using
-freg-struct (which is in fact -freg-struct-return) which works both on
i386 and ARM. I just didn't remember about this and couldn't find it when
looking at gcc docs.

Willy