[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250611113001.GC2273038@noisy.programming.kicks-ass.net>
Date: Wed, 11 Jun 2025 13:30:01 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: "Masami Hiramatsu (Google)" <mhiramat@...nel.org>
Cc: Ingo Molnar <mingo@...nel.org>, Thomas Gleixner <tglx@...utronix.de>,
Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Steven Rostedt <rostedt@...dmis.org>, x86@...nel.org,
Naresh Kamboju <naresh.kamboju@...aro.org>,
open list <linux-kernel@...r.kernel.org>,
Linux trace kernel <linux-trace-kernel@...r.kernel.org>,
lkft-triage@...ts.linaro.org,
Stephen Rothwell <sfr@...b.auug.org.au>,
Arnd Bergmann <arnd@...db.de>,
Dan Carpenter <dan.carpenter@...aro.org>,
Anders Roxell <anders.roxell@...aro.org>
Subject: Re: [RFC PATCH 2/2] x86: alternative: Invalidate the cache for
updated instructions
On Tue, Jun 10, 2025 at 11:47:48PM +0900, Masami Hiramatsu (Google) wrote:
> From: Masami Hiramatsu (Google) <mhiramat@...nel.org>
>
> Invalidate the cache after replacing INT3 with the new instruction.
> This will prevent the other CPUs seeing the removed INT3 in their
> cache after serializing the pipeline.
>
> LKFT reported an oops by INT3 but there is no INT3 shown in the
> dumped code. This means the INT3 is removed after the CPU hits
> INT3.
>
> ## Test log
> ftrace-stress-test: <12>[ 21.971153] /usr/local/bin/kirk[277]:
> starting test ftrace-stress-test (ftrace_stress_test.sh 90)
> <4>[ 58.997439] Oops: int3: 0000 [#1] SMP PTI
> <4>[ 58.998089] CPU: 0 UID: 0 PID: 323 Comm: sh Not tainted
> 6.15.0-next-20250605 #1 PREEMPT(voluntary)
> <4>[ 58.998152] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> <4>[ 58.998260] RIP: 0010:_raw_spin_lock+0x5/0x50
> <4>[ 58.998563] Code: 5d e9 ff 12 00 00 66 66 2e 0f 1f 84 00 00 00
> 00 00 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3
> 0f 1e fa 0f <1f> 44 00 00 55 48 89 e5 53 48 89 fb bf 01 00 00 00 e8 15
> 12 e4 fe
>
> Maybe one possible scenario is to hit the int3 after the third step
> somehow (on I-cache).
>
> ------
> <CPU0> <CPU1>
> Start smp_text_poke_batch_finish().
> Start the third step. (remove INT3)
> on_each_cpu(do_sync_core)
> do_sync_core(do SERIALIZE)
> Finish the third step.
> Hit INT3 (from I-cache)
> Clear text_poke_array_refs[cpu0]
> Start smp_text_poke_int3_handler()
> Failed to get text_poke_array_refs[cpu0]
> Oops: int3
> ------
>
> SERIALIZE instruction flashes pipeline, thus the processor needs
> to reload the instruction. But it is not ensured to reload it from
> memory because SERIALIZE does not invalidate the cache.
>
> To prevent reloading replaced INT3, we need to invalidate the cache
> (flush TLB) in the third step, before the do_sync_core().
This sounds all sorts of wrong. x86 is supposed to be cache-coherent. A
store should cause the invalidation per MESI and all that. This means
the only place where the old instruction can stick around is in the
uarch micro-ops cache and all that, and SERIALIZE will very much flush
those.
Also, TLB flush != I$ flush. There is clflush_cache_range() for this.
But still, this really should not be needed.
Also, this is all qemu, and qemu is known to have gotten this terribly
wrong in the past.
If you all cannot reproduce on real hardware, I'm considering this a
qemu bug.
Powered by blists - more mailing lists