[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wgjyGX3OVDtzJW6Oh2ukviXtJYi9+7eJW75DgX+d673iw@mail.gmail.com>
Date: Mon, 4 Sep 2023 10:28:12 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: Ingo Molnar <mingo@...nel.org>, linux-kernel@...r.kernel.org,
linux-arch@...r.kernel.org, bp@...en8.de
Subject: Re: [PATCH v2] x86: bring back rep movsq for user access on CPUs
without ERMS
On Sun, 3 Sept 2023 at 23:03, Mateusz Guzik <mjguzik@...il.com> wrote:
>
> Worst case if the 64 bit structs differ one can settle for
> user-accessing INIT_STRUCT_STAT_PADDING.
As I said, try it. I think you'll find that you are wrong. It's
_hard_ to get the padding right. The "use a temporary" model of the
current code makes the fallback easy - just clear it before copying
the fields. Without that, you have to get every architecture padding
right manually.
You almost inevitably end up with "one function for the fallback case,
a completely different function for the unsafe_put_user() case, and
fairly painful macro for architectures that get converted".
And even then, you'll get non-optimal code, because you won't get the
order of the stores right to get nice contiguous stores. That
admittedly only matters for architectures with bad store coalescing,
which is hopefully not any of the ones we care about (and those kinds
of microarchitectures usually also want the loads done first, so
ld-ld-ld-ld-st-st-st-st patterns).
But just giving up on that, and using a weak fallback function, and
then an optimal one for the (single) architecture that anybody will do
this for, makes it all much simpler.
Feel free to send me a patch to prove your point. Because without a
patch, I claim you are just blowing hot air.
Linus
Powered by blists - more mailing lists