linux-kernel - Re: [syzbot] [kernel?] linux-next test error: WARNING in switch_mm_irqs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250414185031.GA13096@noisy.programming.kicks-ass.net>
Date: Mon, 14 Apr 2025 20:50:31 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Ingo Molnar <mingo@...nel.org>
Cc: syzbot <syzbot+c2537ce72a879a38113e@...kaller.appspotmail.com>,
	riel@...riel.com, bp@...en8.de, dave.hansen@...ux.intel.com,
	hpa@...or.com, linux-kernel@...r.kernel.org,
	linux-next@...r.kernel.org, luto@...nel.org, mingo@...hat.com,
	sfr@...b.auug.org.au, syzkaller-bugs@...glegroups.com,
	tglx@...utronix.de, x86@...nel.org
Subject: Re: [syzbot] [kernel?] linux-next test error: WARNING in
 switch_mm_irqs_off

On Mon, Apr 14, 2025 at 05:10:03PM +0200, Ingo Molnar wrote:
> 
> * Peter Zijlstra <peterz@...radead.org> wrote:
> 
> > > Call Trace:
> > >  <TASK>
> > >  unuse_temporary_mm+0x9f/0x100 arch/x86/mm/tlb.c:1038
> > >  __text_poke+0x7b6/0xb40 arch/x86/kernel/alternative.c:2214
> > >  text_poke arch/x86/kernel/alternative.c:2257 [inline]
> > >  smp_text_poke_batch_finish+0x3e7/0x12c0 arch/x86/kernel/alternative.c:2565
> > >  arch_jump_label_transform_apply+0x1c/0x30 arch/x86/kernel/jump_label.c:146
> > >  static_key_disable_cpuslocked+0xd2/0x1c0 kernel/jump_label.c:240
> > >  static_key_disable+0x1a/0x20 kernel/jump_label.c:248
> > >  once_deferred+0x70/0xb0 lib/once.c:20
> > >  process_one_work kernel/workqueue.c:3238 [inline]
> > >  process_scheduled_works+0xac3/0x18e0 kernel/workqueue.c:3319
> > >  worker_thread+0x870/0xd50 kernel/workqueue.c:3400
> > >  kthread+0x7b7/0x940 kernel/kthread.c:464
> > >  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:153
> > >  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> > >  </TASK>
> > 
> > So I can reproduce, and I I think I see what happens, except I'm
> > confused as to why the recently merged patches show this.
> > 
> > AFAIU what happens is that unuse_temporary_mm() clears the 
> > mm_cpumask() for the current CPU, while switch_mm_irqs_off() then 
> > checks that the mm_cpumask() bit is set for the current CPU.
> > 
> > This behaviour hasn't really changed since 209954cbc7d0 ("x86/mm/tlb: 
> > Update mm_cpumask lazily") introduced both.
> > 
> > I'm not entirely sure what the best way forward is.. we can simply 
> > delete the warning, or make use_temporary_mm() tag the special MMs 
> > somehow and exclude them from the warning.
> 
> So, mm_cpumask is basically tracking on which CPUs the MM ran on, and 
> this gets cleared lazily whenever there's an opportune time, but not 
> during context switches (for shared cacheline performance reasons), 
> right?
> 
> So why do we need to clear the mm_cpumask in unuse_temporary_mm() to 
> begin with:
> 
> 	/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
>         cpumask_clear_cpu(smp_processor_id(), mm_cpumask(this_cpu_read(cpu_tlbstate.loaded_mm)));
> 
> What TLB flushing are we worried about here? Nothing much should 
> trigger any TLB flushing for text_poke_mm AFAICS?

No, it will actually try and issue TLBI for text_poking_mm and then
things go sideways. If you look up the original thread:

  https://lkml.kernel.org/r/20241113095550.GBZzR3pg-RhJKPDazS@fat_crate.local

you'll find this was discussed. You were on Cc there.

Some of the solutions there made the TLBI not explode, but fundamentally
the whole temporary_mm thing is CPU local and the CR3 switch away is
sufficient.