[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <408ebd8b-4bfb-4c4f-b118-7fe853c6e897@intel.com>
Date: Thu, 20 Feb 2025 09:38:39 -0800
From: Dave Hansen <dave.hansen@...el.com>
To: Valentin Schneider <vschneid@...hat.com>, Jann Horn <jannh@...gle.com>
Cc: linux-kernel@...r.kernel.org, x86@...nel.org,
virtualization@...ts.linux.dev, linux-arm-kernel@...ts.infradead.org,
loongarch@...ts.linux.dev, linux-riscv@...ts.infradead.org,
linux-perf-users@...r.kernel.org, xen-devel@...ts.xenproject.org,
kvm@...r.kernel.org, linux-arch@...r.kernel.org, rcu@...r.kernel.org,
linux-hardening@...r.kernel.org, linux-mm@...ck.org,
linux-kselftest@...r.kernel.org, bpf@...r.kernel.org,
bcm-kernel-feedback-list@...adcom.com, Juergen Gross <jgross@...e.com>,
Ajay Kaher <ajay.kaher@...adcom.com>,
Alexey Makhalov <alexey.amakhalov@...adcom.com>,
Russell King <linux@...linux.org.uk>,
Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
Huacai Chen <chenhuacai@...nel.org>, WANG Xuerui <kernel@...0n.name>,
Paul Walmsley <paul.walmsley@...ive.com>, Palmer Dabbelt
<palmer@...belt.com>, Albert Ou <aou@...s.berkeley.edu>,
Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
"H. Peter Anvin" <hpa@...or.com>, Peter Zijlstra <peterz@...radead.org>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Namhyung Kim <namhyung@...nel.org>, Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
Adrian Hunter <adrian.hunter@...el.com>,
"Liang, Kan" <kan.liang@...ux.intel.com>,
Boris Ostrovsky <boris.ostrovsky@...cle.com>,
Josh Poimboeuf <jpoimboe@...nel.org>,
Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
Sean Christopherson <seanjc@...gle.com>, Paolo Bonzini
<pbonzini@...hat.com>, Andy Lutomirski <luto@...nel.org>,
Arnd Bergmann <arnd@...db.de>, Frederic Weisbecker <frederic@...nel.org>,
"Paul E. McKenney" <paulmck@...nel.org>, Jason Baron <jbaron@...mai.com>,
Steven Rostedt <rostedt@...dmis.org>, Ard Biesheuvel <ardb@...nel.org>,
Neeraj Upadhyay <neeraj.upadhyay@...nel.org>,
Joel Fernandes <joel@...lfernandes.org>,
Josh Triplett <josh@...htriplett.org>, Boqun Feng <boqun.feng@...il.com>,
Uladzislau Rezki <urezki@...il.com>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Lai Jiangshan <jiangshanlai@...il.com>, Zqiang <qiang.zhang1211@...il.com>,
Juri Lelli <juri.lelli@...hat.com>, Clark Williams <williams@...hat.com>,
Yair Podemsky <ypodemsk@...hat.com>, Tomas Glozar <tglozar@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>, Ben Segall
<bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Kees Cook <kees@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>,
Christoph Hellwig <hch@...radead.org>, Shuah Khan <shuah@...nel.org>,
Sami Tolvanen <samitolvanen@...gle.com>, Miguel Ojeda <ojeda@...nel.org>,
Alice Ryhl <aliceryhl@...gle.com>,
"Mike Rapoport (Microsoft)" <rppt@...nel.org>,
Samuel Holland <samuel.holland@...ive.com>, Rong Xu <xur@...gle.com>,
Nicolas Saenz Julienne <nsaenzju@...hat.com>,
Geert Uytterhoeven <geert@...ux-m68k.org>,
Yosry Ahmed <yosryahmed@...gle.com>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
"Masami Hiramatsu (Google)" <mhiramat@...nel.org>,
Jinghao Jia <jinghao7@...inois.edu>, Luis Chamberlain <mcgrof@...nel.org>,
Randy Dunlap <rdunlap@...radead.org>, Tiezhu Yang <yangtiezhu@...ngson.cn>
Subject: Re: [PATCH v4 29/30] x86/mm, mm/vmalloc: Defer
flush_tlb_kernel_range() targeting NOHZ_FULL CPUs
On 2/20/25 09:10, Valentin Schneider wrote:
>> The LDT and maybe the PEBS buffers are the only implicit supervisor
>> accesses to vmalloc()'d memory that I can think of. But those are both
>> handled specially and shouldn't ever get zapped while in use. The LDT
>> replacement has its own IPIs separate from TLB flushing.
>>
>> But I'm actually not all that worried about accesses while actually
>> running userspace. It's that "danger zone" in the kernel between entry
>> and when the TLB might have dangerous garbage in it.
>>
> So say we have kPTI, thus no vmalloc() mapped in CR3 when running
> userspace, and do a full TLB flush right before switching to userspace -
> could the TLB still end up with vmalloc()-range-related entries when we're
> back in the kernel and going through the danger zone?
Yes, because the danger zone includes the switch back to the kernel CR3
with vmalloc() fully mapped. All bets are off about what's in the TLB
the moment that CR3 write occurs.
Actually, you could probably use that.
If a mapping is in the PTI user page table, you can't defer the flushes
for it. Basically the same rule for text poking in the danger zone.
If there's a deferred flush pending, make sure that all of the
SWITCH_TO_KERNEL_CR3's fully flush the TLB. You'd need something similar
to user_pcid_flush_mask.
But, honestly, I'm still not sure this is worth all the trouble. If
folks want to avoid IPIs for TLB flushes, there are hardware features
that *DO* that. Just get new hardware instead of adding this complicated
pile of software that we have to maintain forever. In 10 years, we'll
still have this software *and* 95% of our hardware has the hardware
feature too.
Powered by blists - more mailing lists