[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wjYOZf2wPj_=arATJ==DQQAQwh0ki=Za0RcE542rWBGFw@mail.gmail.com>
Date: Sun, 3 Sep 2023 14:05:34 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
bp@...en8.de
Subject: Re: [PATCH v2] x86: bring back rep movsq for user access on CPUs
without ERMS
On Sun, 3 Sept 2023 at 13:49, Mateusz Guzik <mjguzik@...il.com> wrote:
>
> "real fstat" is syscall(5, fd, &sb).
>
> Sapphire Rapids, will-it-scale, ops/s
>
> stock fstat 5088199
> patched fstat 7625244 (+49%)
> real fstat 8540383 (+67% / +12%)
>
> It dodges lockref et al, but it does not dodge SMAP which accounts for
> the difference.
Side note, since I was looking at this, I hacked up a quick way for
architectures to do their own optimized cp_new_stat() that avoids the
double-buffering.
Sadly it *is* architecture-specific due to padding and
architecture-specific field sizes (and thus EOVERFLOW rules), but it
is what it is.
I don't know how much it matters, but it might make a difference. And
'stat()' is most certainly worth optimizing for, even if glibc has
made our life more difficult.
Want to try out another entirely untested patch? Attached.
Linus
View attachment "patch.diff" of type "text/x-patch" (3087 bytes)
Powered by blists - more mailing lists