[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aQp11AiTJpg_m_MG@google.com>
Date: Tue, 4 Nov 2025 13:53:24 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Mateusz Guzik <mjguzik@...il.com>, "the arch/x86 maintainers" <x86@...nel.org>, brauner@...nel.org,
viro@...iv.linux.org.uk, jack@...e.cz, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org, tglx@...utronix.de, pfalcato@...e.de
Subject: Re: [PATCH 1/3] x86: fix access_ok() and valid_user_address() using
wrong USER_PTR_MAX in modules
On Wed, Nov 05, 2025, Linus Torvalds wrote:
> On Wed, 5 Nov 2025 at 04:07, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > Sadly, no. We've wanted to do that many times for various other
> > reasons, and we really should, but because of historical semantics,
> > some horrendous users still use "__get_user()" for addresses that
> > might be user space or might be kernel space depending on use-case.
Eww.
> > Maybe we should bite the bullet and just break any remaining cases of
> > that horrendous historical pattern. [...]
>
> What I think is probably the right approach is to just take the normal
> __get_user() calls - the ones that are obviously to user space, and
> have an access_ok() - and just replace them with get_user().
>
> That should all be very simple and straightforward for any half-way
> normal code, and you won't see any downsides.
>
> And in the unlikely case that you can measure any performance impact
> because you had one single access_ok() and many __get_user() calls,
> and *if* you really really care, that kind of code should be using
> "user_read_access_begin()" and friends anyway, because unlike the
> range checking, the *real* performance issue is almost certainly going
> to be the cost of the CLAC/STAC instructions.
>
> Put another way: __get_user() is simply always wrong these days.
> Either it's wrong because it's a bad historical optimization that
> isn't an optimization any more, or it's wrong because it's mis-using
> the old semantics to play tricks with kernel-vs-user memory.
>
> So we shouldn't try to "fix" __get_user(). We should aim to get rid of it.
Curiosity got the better of me :-)
TL;DR: I agree, we should kill __get_user().
KVM x86's use case is a bit of a snowflake. KVM does the access_ok() check when
host userspace configures memory regions for the guest, and then does __get_user()
when reading guest PTEs (i.e. when walking the guest's page tables for shadow
paging).
For each access_ok(), there are potentially billions (with a 'b') of __get_user()
calls throughout the lifetime of the guest when KVM is using shadow paging. E.g.
just booting a Linux guest hits the __get_user() in arch/x86/kvm/mmu/paging_tmpl.h
a few million times. So if there's any chance that split access_ok() + __get_user()
provides a performance advantage, then it should show up in KVM's shadow paging
use case.
Unless I botched the measurements, get_user() is straight up faster on both Intel
(EMR) and AMD (Turin). Over tens of millions of calls, get_user() is 12%+ faster
on Intel and 25%+ faster on AMD, relative to __get_user(). The extra overhead is
pretty much entirely due to the LFENCE, as open coding the equivalent via
__uaccess_begin_nospec()+unsafe_get_user()+__uaccess_end(), to avoid the CALL+RET,
yields identical numbers to __get_user(). Dropping the LFENCE, by using
__uaccess_begin(), manages to eke out a victory over get_user() by ~2 cycles, but
that's not remotely worth having to think about whether or not the LFENCE is necessary.
The only setup I can think of that _might_ benefit from __get_user() would be
ancient CPUs without EPT/NPT (i.e. CPUs on which KVM _must_ use shadow paging)
and without SMAP, but those CPUs are so old that IMO they simply aren't relevant
when it comes to performance. Or I suppose the horrors where RET is actually
something else entirely, but that's also a "don't care", at least as far as KVM
is concerned.
Cycles per guest PTE read:
__get_user() get_user() open-coded open-coded, no LFENCE
Intel (EMR) 75.1 67.6 75.3 65.5
AMD (Turin) 68.1 51.1 67.5 49.3
Powered by blists - more mailing lists