linux-kernel - Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <352317e3-c7dc-43b4-b4cb-9644489318d0@intel.com>
Date: Tue, 11 Feb 2025 06:22:27 -0800
From: Dave Hansen <dave.hansen@...el.com>
To: Valentin Schneider <vschneid@...hat.com>, Jann Horn <jannh@...gle.com>
Cc: linux-kernel@...r.kernel.org, x86@...nel.org,
 virtualization@...ts.linux.dev, linux-arm-kernel@...ts.infradead.org,
 loongarch@...ts.linux.dev, linux-riscv@...ts.infradead.org,
 linux-perf-users@...r.kernel.org, xen-devel@...ts.xenproject.org,
 kvm@...r.kernel.org, linux-arch@...r.kernel.org, rcu@...r.kernel.org,
 linux-hardening@...r.kernel.org, linux-mm@...ck.org,
 linux-kselftest@...r.kernel.org, bpf@...r.kernel.org,
 bcm-kernel-feedback-list@...adcom.com, Juergen Gross <jgross@...e.com>,
 Ajay Kaher <ajay.kaher@...adcom.com>,
 Alexey Makhalov <alexey.amakhalov@...adcom.com>,
 Russell King <linux@...linux.org.uk>,
 Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
 Huacai Chen <chenhuacai@...nel.org>, WANG Xuerui <kernel@...0n.name>,
 Paul Walmsley <paul.walmsley@...ive.com>, Palmer Dabbelt
 <palmer@...belt.com>, Albert Ou <aou@...s.berkeley.edu>,
 Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
 Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
 "H. Peter Anvin" <hpa@...or.com>, Peter Zijlstra <peterz@...radead.org>,
 Arnaldo Carvalho de Melo <acme@...nel.org>,
 Namhyung Kim <namhyung@...nel.org>, Mark Rutland <mark.rutland@....com>,
 Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
 Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
 Adrian Hunter <adrian.hunter@...el.com>,
 "Liang, Kan" <kan.liang@...ux.intel.com>,
 Boris Ostrovsky <boris.ostrovsky@...cle.com>,
 Josh Poimboeuf <jpoimboe@...nel.org>,
 Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
 Sean Christopherson <seanjc@...gle.com>, Paolo Bonzini
 <pbonzini@...hat.com>, Andy Lutomirski <luto@...nel.org>,
 Arnd Bergmann <arnd@...db.de>, Frederic Weisbecker <frederic@...nel.org>,
 "Paul E. McKenney" <paulmck@...nel.org>, Jason Baron <jbaron@...mai.com>,
 Steven Rostedt <rostedt@...dmis.org>, Ard Biesheuvel <ardb@...nel.org>,
 Neeraj Upadhyay <neeraj.upadhyay@...nel.org>,
 Joel Fernandes <joel@...lfernandes.org>,
 Josh Triplett <josh@...htriplett.org>, Boqun Feng <boqun.feng@...il.com>,
 Uladzislau Rezki <urezki@...il.com>,
 Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
 Lai Jiangshan <jiangshanlai@...il.com>, Zqiang <qiang.zhang1211@...il.com>,
 Juri Lelli <juri.lelli@...hat.com>, Clark Williams <williams@...hat.com>,
 Yair Podemsky <ypodemsk@...hat.com>, Tomas Glozar <tglozar@...hat.com>,
 Vincent Guittot <vincent.guittot@...aro.org>,
 Dietmar Eggemann <dietmar.eggemann@....com>, Ben Segall
 <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
 Kees Cook <kees@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>,
 Christoph Hellwig <hch@...radead.org>, Shuah Khan <shuah@...nel.org>,
 Sami Tolvanen <samitolvanen@...gle.com>, Miguel Ojeda <ojeda@...nel.org>,
 Alice Ryhl <aliceryhl@...gle.com>,
 "Mike Rapoport (Microsoft)" <rppt@...nel.org>,
 Samuel Holland <samuel.holland@...ive.com>, Rong Xu <xur@...gle.com>,
 Nicolas Saenz Julienne <nsaenzju@...hat.com>,
 Geert Uytterhoeven <geert@...ux-m68k.org>,
 Yosry Ahmed <yosryahmed@...gle.com>,
 "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
 "Masami Hiramatsu (Google)" <mhiramat@...nel.org>,
 Jinghao Jia <jinghao7@...inois.edu>, Luis Chamberlain <mcgrof@...nel.org>,
 Randy Dunlap <rdunlap@...radead.org>, Tiezhu Yang <yangtiezhu@...ngson.cn>
Subject: Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer
 flush_tlb_kernel_range() targeting NOHZ_FULL CPUs

On 2/11/25 05:33, Valentin Schneider wrote:
>> 2. It's wrong to assume that TLB entries are only populated for
>> addresses you access - thanks to speculative execution, you have to
>> assume that the CPU might be populating random TLB entries all over
>> the place.
> Gotta love speculation. Now it is supposed to be limited to genuinely
> accessible data & code, right? Say theoretically we have a full TLBi as
> literally the last thing before doing the return-to-userspace, speculation
> should be limited to executing maybe bits of the return-from-userspace
> code?

In practice, it's mostly limited like that.

Architecturally, there are no promises from the CPU. It is within its
rights to cache anything from the page tables at any time. If it's in
the CR3 tree, it's fair game.

> Furthermore, I would hope that once a CPU is executing in userspace, it's
> not going to populate the TLB with kernel address translations - AIUI the
> whole vulnerability mitigation debacle was about preventing this sort of
> thing.

Nope, unfortunately. There's two big exception to this. First, "implicit
supervisor-mode accesses". There are structures for which the CPU gets a
virtual address and accesses it even while userspace is running. The LDT
and GDT are the most obvious examples, but there are some less
ubiquitous ones like the buffers for PEBS events.

Second, remember that user versus supervisor is determined *BY* the page
tables. Before Linear Address Space Separation (LASS), all virtual
memory accesses walk the page tables, even userspace accesses to kernel
addresses.  The User/Supervisor bit is *in* the page tables, of course.

A userspace access to a kernel address results in a page walk and the
CPU is completely free to cache all or part of that page walk. A
Meltdown-style _speculative_ userspace access to kernel memory won't
generate a fault either. It won't leak data like it used to, of course,
but it can still walk the page tables. That's one reason LASS is needed.