lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 15 Nov 2023 14:26:02 -0500
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Borislav Petkov <bp@...en8.de>
Cc:     David Howells <dhowells@...hat.com>,
        kernel test robot <oliver.sang@...el.com>,
        oe-lkp@...ts.linux.dev, lkp@...el.com,
        linux-kernel@...r.kernel.org,
        Christian Brauner <brauner@...nel.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Jens Axboe <axboe@...nel.dk>, Christoph Hellwig <hch@....de>,
        Christian Brauner <christian@...uner.io>,
        Matthew Wilcox <willy@...radead.org>,
        David Laight <David.Laight@...lab.com>, ying.huang@...el.com,
        feng.tang@...el.com, fengwei.yin@...el.com
Subject: Re: [linus:master] [iov_iter] c9eec08bac: vm-scalability.throughput
 -16.9% regression

On Wed, 15 Nov 2023 at 14:10, Borislav Petkov <bp@...en8.de> wrote:
>
> > Borislav, see
> >
> >     https://lore.kernel.org/all/CAHk-=wjCUckvZUQf7gqp2ziJUWxVpikM_6srFdbcNdBJTxExRg@mail.gmail.com/
> >
> > for some truly crazy code generation by gcc.
>
> Yeah, lemme show that to gcc folks. That asm is with your compiler,
> right? Version?

That was with gcc version 13.2.1.

Note that I only see that crazy thing in lib/iov_iter.s, so I really
do think it has something to do with inlining __builtin_memcpy()
behind a conditional function pointer.

In normal cases, gcc seems to just do the obvious thing (ie expand a
small constant-sized memcpy inline, or just call the external 'memcpy'
function.

So it's some odd pattern that triggers that "expand non-constant
memcpy inline". And once that happens, the odd code generation is
still a bit odd but is at least explicable.

That "do first word by hand, then do aligned 'rep movsq' on top of it"
pattern is weird, but we've seen some similar strange patterns in
hand-written memcpy (eg "use two overlapping 8-byte writes to handle
the 8-15 byte case").

So the real issue is that we don't want an inlined memcpy at all,
unless it's the simple constant-sized case that has been turned into
individual moves with no loop.

Or it's a "rep movsb" with FSRM as a CPUID-based alternative, of course.

                 Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ