[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wh1hi-HnBQRu9_ALQL-fbhyn_go+2c9FajO26khf2dsTw@mail.gmail.com>
Date: Sun, 3 Sep 2023 15:34:30 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Ingo Molnar <mingo@...nel.org>
Cc: Mateusz Guzik <mjguzik@...il.com>, linux-kernel@...r.kernel.org,
linux-arch@...r.kernel.org, bp@...en8.de
Subject: Re: [PATCH v2] x86: bring back rep movsq for user access on CPUs
without ERMS
On Sun, 3 Sept 2023 at 14:48, Ingo Molnar <mingo@...nel.org> wrote:
>
> If measurements support it then this looks like a nice optimization.
Well, it seems to work, but when I profile it to see if the end result
looks reasonable, the profile data is swamped by the return
mispredicts from CPU errata workarounds, and to a smaller degree by
the clac/stac overhead of SMAP.
So it does seem to work - at least it boots here and everything looks
normal - and it does seem to generate good code, but the profiles look
kind of sad.
I also note that we do a lot of stupid pointless 'statx' work that is
then entirely thrown away for a regular stat() system call.
Part of it is actual extra work to set the statx fields.
But a lot of it is that even if we didn't do that, the 'statx' code
has made 'struct kstat' much bigger, and made our code footprints much
worse.
Of course, even without the useless statx overhead, 'struct kstat'
itself ends up having a lot of padding because of how 'struct
timespec64' looks. It might actually be good to split it explicitly
into seconds and nanoseconds just for padding.
Because that all blows 'struct kstat' up to 160 bytes here.
And to make it all worse, the statx code has caused all the
filesystems to have their own 'getattr()' code just to fill in that
worthless garbage, when it used to be that you could rely on
'generic_fillattr()'.
I'm looking at ext4_getattr(), for example, and I think *all* of it is
due to statx - that to a close approximation nobody cares about, and
is a specialty system call for a couple of users
And again - the indirect branches have gone from being "a cycle or
two" to being pipeline stalls and mispredicts. So not using just a
plain 'generic_fillattr()' is *expensive*.
Sad. Because the *normal* stat() family of system calls are some of
the most important ones out there. Very much unlike statx().
Linus
Powered by blists - more mailing lists