[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fa7a3ea2c6326639911fbe49b86975f79db92372.camel@redhat.com>
Date: Thu, 10 Jul 2025 15:40:36 +0200
From: Gabriele Monaco <gmonaco@...hat.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, kernel test robot
<oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, aubrey.li@...ux.intel.com,
yu.c.chen@...el.com, Andrew Morton <akpm@...ux-foundation.org>, David
Hildenbrand <david@...hat.com>, Ingo Molnar <mingo@...hat.com>, Peter
Zijlstra <peterz@...radead.org>, "Paul E. McKenney" <paulmck@...nel.org>,
Ingo Molnar <mingo@...hat.org>
Subject: Re: [PATCH v14 2/3] sched: Move task_mm_cid_work to mm timer
On Thu, 2025-07-10 at 09:23 -0400, Mathieu Desnoyers wrote:
> On 2025-07-10 00:56, kernel test robot wrote:
> >
> >
> > Hello,
> >
> > kernel test robot noticed "WARNING:inconsistent_lock_state" on:
> >
> > commit: d06e66c6025e44136e6715d24c23fb821a415577 ("[PATCH v14 2/3]
> > sched: Move task_mm_cid_work to mm timer")
> > url:
> > https://github.com/intel-lab-lkp/linux/commits/Gabriele-Monaco/sched-Add-prev_sum_exec_runtime-support-for-RT-DL-and-SCX-classes/20250707-224959
> > patch link:
> > https://lore.kernel.org/all/20250707144824.117014-3-gmonaco@redhat.com/
> > patch subject: [PATCH v14 2/3] sched: Move task_mm_cid_work to mm
> > timer
> >
> > in testcase: boot
> >
> > config: x86_64-randconfig-003-20250708
> > compiler: gcc-11
> > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp
> > 2 -m 16G
> >
> > (please refer to attached dmesg/kmsg for entire log/backtrace)
> >
> >
> > +-------------------------------------------------+------------+---
> > ---------+
> > > | 50c1dc07ee |
> > > d06e66c602 |
> > +-------------------------------------------------+------------+---
> > ---------+
> > > WARNING:inconsistent_lock_state | 0 |
> > > 12 |
> > > inconsistent{SOFTIRQ-ON-W}->{IN-SOFTIRQ-W}usage | 0 |
> > > 12 |
> > +-------------------------------------------------+------------+---
> > ---------+
> >
>
> I suspect the issue comes from calling mmdrop(mm) from timer context
> in a scenario
> where the mm_count can drop to 0.
>
> This causes calls to pgd_free() and such to take the pgd_lock in
> softirq
> context, when in other cases it's taken with softirqs enabled.
>
> See "mmdrop_sched()" for RT. I think we need something similar for
> the
> non-RT case, e.g. a:
>
> static inline void __mmdrop_delayed(struct rcu_head *rhp)
> {
> struct mm_struct *mm = container_of(rhp, struct mm_struct,
> delayed_drop);
>
> __mmdrop(mm);
> }
>
> static inline void mmdrop_timer(struct mm_struct *mm)
> {
> /* Provides a full memory barrier. See mmdrop() */
> if (atomic_dec_and_test(&mm->mm_count))
> call_rcu(&mm->delayed_drop, __mmdrop_delayed);
> }
>
> Thoughts ?
>
Thanks for the suggestion.
I noticed the problem is in the mmdrop over there, but I'm seeing this
is getting unnecessarily complicated.
I'm not sure it's worth going down this path, also considering pushing
the timer wheel like this might end up in unintended effects like it
happened with the workqueue.
I am going to try the alternative approach of running the scan in
batches [1] still using a task_work but triggering it from
__rseq_handle_notify_resume like here.
If that works in the original usecase, I guess it's better to keep it
that way.
What do you think?
Thanks,
Gabriele
[1] -
https://lore.kernel.org/lkml/20250217112317.258716-1-gmonaco@redhat.com
> Thanks,
>
> Mathieu
>
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a
> > new version of
> > the same patch/commit), kindly add following tags
> > > Reported-by: kernel test robot <oliver.sang@...el.com>
> > > Closes:
> > > https://lore.kernel.org/oe-lkp/202507100606.90787fe6-lkp@intel.com
> >
> >
> > [ 26.556715][ C0] WARNING: inconsistent lock state
> > [ 26.557127][ C0] 6.16.0-rc5-00002-gd06e66c6025e #1 Tainted:
> > G T
> > [ 26.557730][ C0] --------------------------------
> > [ 26.558133][ C0] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-
> > W} usage.
> > [ 26.558662][ C0] stdbuf/386 [HC0[0]:SC1[1]:HE1:SE0] takes:
> > [ 26.559118][ C0] ffffffff870d4438 (pgd_lock){+.?.}-{3:3}, at:
> > pgd_free (arch/x86/mm/pgtable.c:67 arch/x86/mm/pgtable.c:98
> > arch/x86/mm/pgtable.c:379)
> > [ 26.559786][ C0] {SOFTIRQ-ON-W} state was registered at:
> > [ 26.560232][ C0] mark_usage (kernel/locking/lockdep.c:4669)
> > [ 26.560561][ C0] __lock_acquire (kernel/locking/lockdep.c:5194)
> > [ 26.560929][ C0] lock_acquire (kernel/locking/lockdep.c:473
> > kernel/locking/lockdep.c:5873)
> > [ 26.561267][ C0] _raw_spin_lock
> > (include/linux/spinlock_api_smp.h:134
> > kernel/locking/spinlock.c:154)
> > [ 26.561617][ C0] pgd_alloc (arch/x86/mm/pgtable.c:86
> > arch/x86/mm/pgtable.c:353)
> > [ 26.561950][ C0] mm_init+0x64f/0xbfb
> > [ 26.562342][ C0] mm_alloc (kernel/fork.c:1109)
> > [ 26.562655][ C0] dma_resv_lockdep (drivers/dma-buf/dma-resv.c:784)
> > [ 26.563020][ C0] do_one_initcall (init/main.c:1274)
> > [ 26.563389][ C0] do_initcalls (init/main.c:1335 init/main.c:1352)
> > [ 26.563744][ C0] kernel_init_freeable (init/main.c:1588)
> > [ 26.564144][ C0] kernel_init (init/main.c:1476)
> > [ 26.564402][ C0] ret_from_fork (arch/x86/kernel/process.c:154)
> > [ 26.564633][ C0] ret_from_fork_asm (arch/x86/entry/entry_64.S:258)
> > [ 26.564871][ C0] irq event stamp: 4774
> > [ 26.565070][ C0] hardirqs last enabled at (4774):
> > _raw_spin_unlock_irq (arch/x86/include/asm/irqflags.h:42
> > arch/x86/include/asm/irqflags.h:119
> > include/linux/spinlock_api_smp.h:159 kernel/locking/spinlock.c:202)
> > [ 26.565526][ C0] hardirqs last disabled at (4773):
> > _raw_spin_lock_irq (arch/x86/include/asm/preempt.h:80
> > include/linux/spinlock_api_smp.h:118 kernel/locking/spinlock.c:170)
> > [ 26.565971][ C0] softirqs last enabled at (4256): local_bh_enable
> > (include/linux/bottom_half.h:33)
> > [ 26.566408][ C0] softirqs last disabled at (4771): __do_softirq
> > (kernel/softirq.c:614)
> > [ 26.566823][ C0]
> > [ 26.566823][ C0] other info that might help us debug this:
> > [ 26.567198][ C0] Possible unsafe locking scenario:
> > [ 26.567198][ C0]
> > [ 26.567548][ C0] CPU0
> > [ 26.567709][ C0] ----
> > [ 26.567869][ C0] lock(pgd_lock);
> > [ 26.568060][ C0] <Interrupt>
> > [ 26.568255][ C0] lock(pgd_lock);
> > [ 26.568452][ C0]
> > [ 26.568452][ C0] *** DEADLOCK ***
> > [ 26.568452][ C0]
> > [ 26.568830][ C0] 3 locks held by stdbuf/386:
> > [ 26.569056][ C0] #0: ffff888170d5c1a8 (&sb->s_type-
> > >i_mutex_key){++++}-{4:4}, at: lookup_slow (fs/namei.c:1834)
> > [ 26.569535][ C0] #1: ffff888170cf5850 (&lockref->lock){+.+.}-
> > {3:3}, at: d_alloc (include/linux/dcache.h:319 fs/dcache.c:1777)
> > [ 26.569961][ C0] #2: ffffc90000007d40 ((&mm->cid_timer)){+.-.}-
> > {0:0}, at: call_timer_fn (kernel/time/timer.c:1744)
> > [ 26.570421][ C0]
> > [ 26.570421][ C0] stack backtrace:
> > [ 26.570704][ C0] CPU: 0 UID: 0 PID: 386 Comm: stdbuf Tainted:
> > G T 6.16.0-rc5-00002-gd06e66c6025e #1
> > PREEMPT(voluntary) 39c5cbdaf5b4eb171776daa7d42daa95c0766676
> > [ 26.570716][ C0] Tainted: [T]=RANDSTRUCT
> > [ 26.570719][ C0] Call Trace:
> > [ 26.570723][ C0] <IRQ>
> > [ 26.570727][ C0] dump_stack_lvl (lib/dump_stack.c:122
> > (discriminator 4))
> > [ 26.570735][ C0] dump_stack (lib/dump_stack.c:130)
> > [ 26.570740][ C0] print_usage_bug (kernel/locking/lockdep.c:4047)
> > [ 26.570748][ C0] valid_state (kernel/locking/lockdep.c:4060)
> > [ 26.570755][ C0] mark_lock_irq (kernel/locking/lockdep.c:4270)
> > [ 26.570762][ C0] ? save_trace (kernel/locking/lockdep.c:592)
> > [ 26.570773][ C0] ? mark_lock (kernel/locking/lockdep.c:4728
> > (discriminator 3))
> > [ 26.570780][ C0] mark_lock (kernel/locking/lockdep.c:4756)
> > [ 26.570787][ C0] mark_usage (kernel/locking/lockdep.c:4645)
> > [ 26.570796][ C0] __lock_acquire (kernel/locking/lockdep.c:5194)
> > [ 26.570804][ C0] lock_acquire (kernel/locking/lockdep.c:473
> > kernel/locking/lockdep.c:5873)
> > [ 26.570811][ C0] ? pgd_free (arch/x86/mm/pgtable.c:67
> > arch/x86/mm/pgtable.c:98 arch/x86/mm/pgtable.c:379)
> > [ 26.570822][ C0] ? validate_chain (kernel/locking/lockdep.c:3826
> > kernel/locking/lockdep.c:3879)
> > [ 26.570828][ C0] ? wake_up_new_task (kernel/sched/core.c:10597)
> > [ 26.570839][ C0] _raw_spin_lock
> > (include/linux/spinlock_api_smp.h:134
> > kernel/locking/spinlock.c:154)
> > [ 26.570845][ C0] ? pgd_free (arch/x86/mm/pgtable.c:67
> > arch/x86/mm/pgtable.c:98 arch/x86/mm/pgtable.c:379)
> > [ 26.570854][ C0] pgd_free (arch/x86/mm/pgtable.c:67
> > arch/x86/mm/pgtable.c:98 arch/x86/mm/pgtable.c:379)
> > [ 26.570863][ C0] ? wake_up_new_task (kernel/sched/core.c:10597)
> > [ 26.570873][ C0] __mmdrop (kernel/fork.c:681)
> > [ 26.570882][ C0] ? wake_up_new_task (kernel/sched/core.c:10597)
> > [ 26.570891][ C0] mmdrop (include/linux/sched/mm.h:55)
> > [ 26.570901][ C0] task_mm_cid_scan (kernel/sched/core.c:10619
> > (discriminator 3))
> > [ 26.570910][ C0] ? lock_is_held (include/linux/lockdep.h:249)
> > [ 26.570918][ C0] ? wake_up_new_task (kernel/sched/core.c:10597)
> > [ 26.570928][ C0] call_timer_fn (arch/x86/include/asm/atomic.h:23
> > include/linux/atomic/atomic-arch-fallback.h:457
> > include/linux/jump_label.h:262 include/trace/events/timer.h:127
> > kernel/time/timer.c:1748)
> > [ 26.570935][ C0] ? trace_timer_base_idle
> > (kernel/time/timer.c:1724)
> > [ 26.570943][ C0] ? wake_up_new_task (kernel/sched/core.c:10597)
> > [ 26.570953][ C0] ? wake_up_new_task (kernel/sched/core.c:10597)
> > [ 26.570962][ C0] __run_timers (kernel/time/timer.c:1799
> > kernel/time/timer.c:2372)
> > [ 26.570970][ C0] ? add_timer_global (kernel/time/timer.c:2343)
> > [ 26.570977][ C0] ? __kasan_check_write (mm/kasan/shadow.c:38)
> > [ 26.570988][ C0] ? do_raw_spin_lock
> > (arch/x86/include/asm/atomic.h:107 include/linux/atomic/atomic-
> > arch-fallback.h:2170 include/linux/atomic/atomic-
> > instrumented.h:1302 include/asm-generic/qspinlock.h:111
> > kernel/locking/spinlock_debug.c:116)
> > [ 26.570996][ C0] ? __raw_spin_lock_init
> > (kernel/locking/spinlock_debug.c:114)
> > [ 26.571006][ C0] __run_timer_base (kernel/time/timer.c:2385)
> > [ 26.571014][ C0] run_timer_base (kernel/time/timer.c:2394)
> > [ 26.571021][ C0] run_timer_softirq
> > (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-
> > fallback.h:457 include/linux/jump_label.h:262
> > kernel/time/timer.c:342 kernel/time/timer.c:2406)
> > [ 26.571028][ C0] handle_softirqs (arch/x86/include/asm/atomic.h:23
> > include/linux/atomic/atomic-arch-fallback.h:457
> > include/linux/jump_label.h:262 include/trace/events/irq.h:142
> > kernel/softirq.c:580)
> > [ 26.571039][ C0] __do_softirq (kernel/softirq.c:614)
> > [ 26.571046][ C0] __irq_exit_rcu (kernel/softirq.c:453
> > kernel/softirq.c:680)
> > [ 26.571055][ C0] irq_exit_rcu (kernel/softirq.c:698)
> > [ 26.571064][ C0] sysvec_apic_timer_interrupt
> > (arch/x86/kernel/apic/apic.c:1050 arch/x86/kernel/apic/apic.c:1050)
> > [ 26.571076][ C0] </IRQ>
> > [ 26.571078][ C0] <TASK>
> > [ 26.571081][ C0] asm_sysvec_apic_timer_interrupt
> > (arch/x86/include/asm/idtentry.h:574)
> > [ 26.571088][ C0] RIP: 0010:d_alloc (fs/dcache.c:1778)
> > [ 26.571100][ C0] Code: 8d 7c 24 50 b8 ff ff 37 00 ff 83 f8 00 00
> > 00 48 89 fa 48 c1 e0 2a 48 c1 ea 03 80 3c 02 00 74 05 e8 5f f3 f6
> > ff 49 89 5c 24 50 <49> 8d bc 24 10 01 00 00 48 8d b3 20 01 00 00 e8
> > 87 bc ff ff 4c 89
> > All code
> > ========
> > 0: 8d 7c 24 50 lea 0x50(%rsp),%edi
> > 4: b8 ff ff 37 00 mov $0x37ffff,%eax
> > 9: ff 83 f8 00 00 00 incl 0xf8(%rbx)
> > f: 48 89 fa mov %rdi,%rdx
> > 12: 48 c1 e0 2a shl $0x2a,%rax
> > 16: 48 c1 ea 03 shr $0x3,%rdx
> > 1a: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1)
> > 1e: 74 05 je 0x25
> > 20: e8 5f f3 f6 ff call 0xfffffffffff6f384
> > 25: 49 89 5c 24 50 mov %rbx,0x50(%r12)
> > 2a:* 49 8d bc 24 10 01 00 lea
> > 0x110(%r12),%rdi <-- trapping instruction
> > 31: 00
> > 32: 48 8d b3 20 01 00 00 lea 0x120(%rbx),%rsi
> > 39: e8 87 bc ff ff call 0xffffffffffffbcc5
> > 3e: 4c rex.WR
> > 3f: 89 .byte 0x89
> >
> > Code starting with the faulting instruction
> > ===========================================
> > 0: 49 8d bc 24 10 01 00 lea 0x110(%r12),%rdi
> > 7: 00
> > 8: 48 8d b3 20 01 00 00 lea 0x120(%rbx),%rsi
> > f: e8 87 bc ff ff call 0xffffffffffffbc9b
> > 14: 4c rex.WR
> > 15: 89 .byte 0x89
> >
> >
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20250710/202507100606.90787fe6-lkp@intel.com
> >
> >
> >
>
Powered by blists - more mailing lists