[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250612091754.b56ed1faf47cdcc1b90aafcd@kernel.org>
Date: Thu, 12 Jun 2025 09:17:54 +0900
From: Masami Hiramatsu (Google) <mhiramat@...nel.org>
To: Peter Zijlstra <peterz@...radead.org>, Naresh Kamboju
<naresh.kamboju@...aro.org>
Cc: Ingo Molnar <mingo@...nel.org>, Thomas Gleixner <tglx@...utronix.de>,
Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
Steven Rostedt <rostedt@...dmis.org>, x86@...nel.org, Naresh Kamboju
<naresh.kamboju@...aro.org>, open list <linux-kernel@...r.kernel.org>,
Linux trace kernel <linux-trace-kernel@...r.kernel.org>,
lkft-triage@...ts.linaro.org, Stephen Rothwell <sfr@...b.auug.org.au>, Arnd
Bergmann <arnd@...db.de>, Dan Carpenter <dan.carpenter@...aro.org>, Anders
Roxell <anders.roxell@...aro.org>
Subject: Re: [RFC PATCH 2/2] x86: alternative: Invalidate the cache for
updated instructions
On Wed, 11 Jun 2025 13:30:01 +0200
Peter Zijlstra <peterz@...radead.org> wrote:
> On Tue, Jun 10, 2025 at 11:47:48PM +0900, Masami Hiramatsu (Google) wrote:
> > From: Masami Hiramatsu (Google) <mhiramat@...nel.org>
> >
> > Invalidate the cache after replacing INT3 with the new instruction.
> > This will prevent the other CPUs seeing the removed INT3 in their
> > cache after serializing the pipeline.
> >
> > LKFT reported an oops by INT3 but there is no INT3 shown in the
> > dumped code. This means the INT3 is removed after the CPU hits
> > INT3.
> >
> > ## Test log
> > ftrace-stress-test: <12>[ 21.971153] /usr/local/bin/kirk[277]:
> > starting test ftrace-stress-test (ftrace_stress_test.sh 90)
> > <4>[ 58.997439] Oops: int3: 0000 [#1] SMP PTI
> > <4>[ 58.998089] CPU: 0 UID: 0 PID: 323 Comm: sh Not tainted
> > 6.15.0-next-20250605 #1 PREEMPT(voluntary)
> > <4>[ 58.998152] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> > BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> > <4>[ 58.998260] RIP: 0010:_raw_spin_lock+0x5/0x50
> > <4>[ 58.998563] Code: 5d e9 ff 12 00 00 66 66 2e 0f 1f 84 00 00 00
> > 00 00 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3
> > 0f 1e fa 0f <1f> 44 00 00 55 48 89 e5 53 48 89 fb bf 01 00 00 00 e8 15
> > 12 e4 fe
> >
> > Maybe one possible scenario is to hit the int3 after the third step
> > somehow (on I-cache).
> >
> > ------
> > <CPU0> <CPU1>
> > Start smp_text_poke_batch_finish().
> > Start the third step. (remove INT3)
> > on_each_cpu(do_sync_core)
> > do_sync_core(do SERIALIZE)
> > Finish the third step.
> > Hit INT3 (from I-cache)
> > Clear text_poke_array_refs[cpu0]
> > Start smp_text_poke_int3_handler()
> > Failed to get text_poke_array_refs[cpu0]
> > Oops: int3
> > ------
> >
> > SERIALIZE instruction flashes pipeline, thus the processor needs
> > to reload the instruction. But it is not ensured to reload it from
> > memory because SERIALIZE does not invalidate the cache.
> >
> > To prevent reloading replaced INT3, we need to invalidate the cache
> > (flush TLB) in the third step, before the do_sync_core().
>
> This sounds all sorts of wrong. x86 is supposed to be cache-coherent. A
> store should cause the invalidation per MESI and all that. This means
> the only place where the old instruction can stick around is in the
> uarch micro-ops cache and all that, and SERIALIZE will very much flush
> those.
OK, thanks for pointing it out!
>
> Also, TLB flush != I$ flush. There is clflush_cache_range() for this.
> But still, this really should not be needed.
>
> Also, this is all qemu, and qemu is known to have gotten this terribly
> wrong in the past.
What about KVM? We need to ask Naresh how it is running on the machine.
Naresh, can you tell us how the VM is running? Does that use KVM?
And if so, how the kvm is configured(it may depend on the real hardware)?
>
> If you all cannot reproduce on real hardware, I'm considering this a
> qemu bug.
OK, if it is a qemu's bug, dropping [2/2], but I think we still need
[1/2] to avoid kernel crash (with a warning message without dump).
Thank you,
>
>
--
Masami Hiramatsu (Google) <mhiramat@...nel.org>
Powered by blists - more mailing lists