lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250222063210.GA11482@1wt.eu>
Date: Sat, 22 Feb 2025 07:32:10 +0100
From: Willy Tarreau <w@....eu>
To: David Laight <david.laight.linux@...il.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
        Jan Engelhardt <ej@...i.de>, "H. Peter Anvin" <hpa@...or.com>,
        Greg KH <gregkh@...uxfoundation.org>,
        Boqun Feng <boqun.feng@...il.com>,
        Miguel Ojeda <miguel.ojeda.sandonis@...il.com>,
        Christoph Hellwig <hch@...radead.org>,
        rust-for-linux <rust-for-linux@...r.kernel.org>,
        David Airlie <airlied@...il.com>, linux-kernel@...r.kernel.org,
        ksummit@...ts.linux.dev
Subject: Re: C aggregate passing (Rust kernel policy)

On Fri, Feb 21, 2025 at 09:45:01PM +0000, David Laight wrote:
> On Fri, 21 Feb 2025 11:12:27 -0800
> Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> 
> > On Fri, 21 Feb 2025 at 10:34, David Laight <david.laight.linux@...il.com> wrote:
> > >
> > > As Linus said, most modern ABI pass short structures in one or two registers
> > > (or stack slots).
> > > But aggregate returns are always done by passing a hidden pointer argument.
> > >
> > > It is annoying that double-sized integers (u64 on 32bit and u128 on 64bit)
> > > are returned in a register pair - but similar sized structures have to be
> > > returned by value.  
> > 
> > No, they really don't. At least not on x86 and arm64 with our ABI.
> > Two-register structures get returned in registers too.
> > 
> > Try something like this:
> > 
> >   struct a {
> >         unsigned long val1, val2;
> >   } function(void)
> >   { return (struct a) { 5, 100 }; }
> > 
> > and you'll see both gcc and clang generate
> > 
> >         movl $5, %eax
> >         movl $100, %edx
> >         retq
> > 
> > (and you'll similar code on other architectures).
> 
> Humbug, I'm sure it didn't do that the last time I tried it.

You have not dreamed, most likely last time you tried it was on
a 32-bit arch like i386 or ARM. Gcc doesn't do that there, most
likely due to historic reasons that couldn't be changed later,
it passes a pointer argument to write the data there:

  00000000 <fct>:
     0:   8b 44 24 04             mov    0x4(%esp),%eax
     4:   c7 00 05 00 00 00       movl   $0x5,(%eax)
     a:   c7 40 04 64 00 00 00    movl   $0x64,0x4(%eax)
    11:   c2 04 00                ret    $0x4

You can improve it slightly with -mregparm but that's all,
and I never found an option nor attribute to change that:

  00000000 <fct>:
     0:   c7 00 05 00 00 00       movl   $0x5,(%eax)
     6:   c7 40 04 64 00 00 00    movl   $0x64,0x4(%eax)
     d:   c3                      ret

ARM does the same on 32 bits:

  00000000 <fct>:
     0:   2105            movs    r1, #5
     2:   2264            movs    r2, #100        ; 0x64
     4:   e9c0 1200       strd    r1, r2, [r0]
     8:   4770            bx      lr

I think it's simply that this practice arrived long after these old
architectures were fairly common and it was too late to change their
ABI. But x86_64 and aarch64 had the opportunity to benefit from this.
For example, gcc-3.4 on x86_64 already does the right thing:

  0000000000000000 <fct>:
     0:   ba 64 00 00 00          mov    $0x64,%edx
     5:   b8 05 00 00 00          mov    $0x5,%eax
     a:   c3                      retq
  
So does aarch64 since the oldest gcc I have that supports it (linaro 4.7):

  0000000000000000 <fct>:
     0:   d28000a0        mov     x0, #0x5                        // #5
     4:   d2800c81        mov     x1, #0x64                       // #100
     8:   d65f03c0        ret

For my use cases I consider that older architectures are not favored but
they are not degraded either, while newer ones do significantly benefit
from the approach, that's why I'm using it extensively.

Quite frankly, there's no reason to avoid using this for pairs of pointers
or (status,value) pairs or coordinates etc. And if you absolutely need to
also support 32-bit archs optimally, you can do it using a macro to turn
your structs to a larger register and back:

  struct a {
          unsigned long v1, v2;
  };

  #define MKPAIR(x) (((unsigned long long)(x.v1) << 32) | (x.v2))
  #define GETPAIR(x) ({ unsigned long long _x = x; (struct a){ .v1 = (_x >> 32), .v2 = (_x)}; })

  unsigned long long fct(void)
  {
          struct a a = { 5, 100 };
          return MKPAIR(a);
  }

  long caller(void)
  {
          struct a a = GETPAIR(fct());
          return a.v1 + a.v2;
  }

  00000000 <fct>:
     0:   b8 64 00 00 00          mov    $0x64,%eax
     5:   ba 05 00 00 00          mov    $0x5,%edx
     a:   c3                      ret

  0000000b <caller>:
     b:   b8 69 00 00 00          mov    $0x69,%eax
    10:   c3                      ret

But quite frankly due to their relevance these days I don't think it's
worth the effort.

Hoping this helps,
Willy

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ