lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wh1hi-HnBQRu9_ALQL-fbhyn_go+2c9FajO26khf2dsTw@mail.gmail.com>
Date:   Sun, 3 Sep 2023 15:34:30 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Ingo Molnar <mingo@...nel.org>
Cc:     Mateusz Guzik <mjguzik@...il.com>, linux-kernel@...r.kernel.org,
        linux-arch@...r.kernel.org, bp@...en8.de
Subject: Re: [PATCH v2] x86: bring back rep movsq for user access on CPUs
 without ERMS

On Sun, 3 Sept 2023 at 14:48, Ingo Molnar <mingo@...nel.org> wrote:
>
> If measurements support it then this looks like a nice optimization.

Well, it seems to work, but when I profile it to see if the end result
looks reasonable, the profile data is swamped by the return
mispredicts from CPU errata workarounds, and to a smaller degree by
the clac/stac overhead of SMAP.

So it does seem to work - at least it boots here and everything looks
normal - and it does seem to generate good code, but the profiles look
kind of sad.

I also note that we do a lot of stupid pointless 'statx' work that is
then entirely thrown away for a regular stat() system call.

Part of it is actual extra work to set the statx fields.

But a lot of it is that even if we didn't do that, the 'statx' code
has made 'struct kstat' much bigger, and made our code footprints much
worse.

Of course, even without the useless statx overhead, 'struct kstat'
itself ends up having a lot of padding because of how 'struct
timespec64' looks. It might actually be good to split it explicitly
into seconds and nanoseconds just for padding.

Because that all blows 'struct kstat' up to 160 bytes here.

And to make it all worse, the statx code has caused all the
filesystems to have their own 'getattr()' code just to fill in that
worthless garbage, when it used to be that you could rely on
'generic_fillattr()'.

I'm looking at ext4_getattr(), for example, and I think *all* of it is
due to statx - that to a close approximation nobody cares about, and
is a specialty system call for a couple of users

And again - the indirect branches have gone from being "a cycle or
two" to being pipeline stalls and mispredicts. So not using just a
plain 'generic_fillattr()' is *expensive*.

Sad. Because the *normal* stat() family of system calls are some of
the most important ones out there. Very much unlike statx().

              Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ