[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAHk-=wj9yyNH7Xj3r_zO2vOtwfB8+vBt03Z7XRpJE9qCo-K6vg@mail.gmail.com>
Date: Thu, 6 Nov 2025 11:49:21 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: David Laight <david.laight.linux@...il.com>
Cc: Mateusz Guzik <mjguzik@...il.com>, Borislav Petkov <bp@...en8.de>,
"the arch/x86 maintainers" <x86@...nel.org>, brauner@...nel.org, viro@...iv.linux.org.uk, jack@...e.cz,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
tglx@...utronix.de, pfalcato@...e.de
Subject: Re: [PATCH 1/3] x86: fix access_ok() and valid_user_address() using
wrong USER_PTR_MAX in modules
On Thu, 6 Nov 2025 at 11:26, David Laight <david.laight.linux@...il.com> wrote:
>
> IIRC it was a definite performance improvement for a specific workload
> (compiling kernels) on a system where the relatively small d-cache
> caused significant overhead reading the value from memory.
Some background:
https://lore.kernel.org/lkml/20240610204821.230388-1-torvalds@linux-foundation.org/
https://lore.kernel.org/lkml/CAHk-=whHvMbfL2ov1MRbT9QfebO2d6-xXi1ynznCCi-k_m6Q0w@mail.gmail.com/
where that "load address from memory" was particularly noticeable on
my 128-core Altra box in profiles.
That machine really has fairly weak cores and caches (it's what I call
a "flock of chickens" design: individual cores are not particularly
interesting, and the only point of that machine is "reasonable
performance on multithreaded loads thanks to many cores").
I did have numbers, but never posted them, because as mentioned in one
of the emails:
For example, making d_hash() avoid indirection just means that now
pretty much _all_ the cost of __d_lookup_rcu() is in the cache misses
on the hash table itself. Which was always the bulk of it. And on my
arm64 machine, it turns out that the best optimization for the load I
tested would be to make that hash table smaller to actually be a bit
denser in the cache, But that's such a load-dependent optimization
that I'm not doing this.
IOW, the actual biggest impact on that machine was when I hacked the
dcache hash tables to be smaller, so that it fit better in the L2.
But that's one of those "tune for the benchmark and the particular
machine" things that I despise, so I never did that except locally for
testing.
The patches that actually got committed are "these improve performance
a bit by just making the code do the same thing, just being less
stupid". Much less noticeable than the "tune for the machine".
Linus
Powered by blists - more mailing lists