linux-kernel - Re: [syzbot] [fs?] INFO: task hung in synchronize

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <7ea26a76-5c8c-a0d2-5b5e-63e370cdcb99@I-love.SAKURA.ne.jp>
Date:   Thu, 4 May 2023 16:01:23 +0900
From:   Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:     Hillf Danton <hdanton@...a.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     syzbot <syzbot+222aa26d0a5dbc2e84fe@...kaller.appspotmail.com>,
        linux-kernel@...r.kernel.org, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [fs?] INFO: task hung in synchronize_rcu (4)

On 2023/05/04 15:16, Hillf Danton wrote:
>> 4 locks held by syz-executor.2/5077:
>>  #0: ffff8880b993c2d8 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2f/0x120 kernel/sched/core.c:539
>>  #1: ffff88802296aef0 (&mm->cid_lock#2){....}-{2:2}, at: mm_cid_get kernel/sched/sched.h:3280 [inline]
>>  #1: ffff88802296aef0 (&mm->cid_lock#2){....}-{2:2}, at: switch_mm_cid kernel/sched/sched.h:3302 [inline]
>>  #1: ffff88802296aef0 (&mm->cid_lock#2){....}-{2:2}, at: prepare_task_switch kernel/sched/core.c:5117 [inline]
>>  #1: ffff88802296aef0 (&mm->cid_lock#2){....}-{2:2}, at: context_switch kernel/sched/core.c:5258 [inline]
>>  #1: ffff88802296aef0 (&mm->cid_lock#2){....}-{2:2}, at: __schedule+0x2802/0x5770 kernel/sched/core.c:6625
>>  #2: ffff8880b9929698 (&base->lock){-.-.}-{2:2}, at: lock_timer_base+0x5a/0x1f0 kernel/time/timer.c:999
>>  #3: ffffffff91fb4ac8 (&obj_hash[i].lock){-.-.}-{2:2}, at: debug_object_activate+0x134/0x3f0 lib/debugobjects.c:690
> 
> What is hard to understand in this report is, how could acquire the
> timer base lock with the mm cid lock held [1]?

Please be aware that lockdep_print_held_locks() is not an atomic action.
Since synchronous printk() is slow, it can sometimes happen that
task_is_running(p) becomes true after passing the

	if (p != current && task_is_running(p))
		return;

check. I think that this trace is an example where print_lock() by chance hit
hlock_class(p->held_locks + 2) != NULL. If sched_show_task() were also available,
we can know it via mismatch between sched_show_task() and lockdep_print_held_locks().

Linus, I think that "[PATCH v3 (repost)] locking/lockdep: add debug_show_all_lock_holders()"
helps here, but I can't wake up locking people. What can we do?

> 
> static inline int mm_cid_get(struct mm_struct *mm)
> {
> 	int ret;
> 
> 	lockdep_assert_irqs_disabled();
> 	raw_spin_lock(&mm->cid_lock);
> 	ret = __mm_cid_get(mm);
> 	raw_spin_unlock(&mm->cid_lock);
> 	return ret;
> }
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/tree/kernel/sched/sched.h?id=6686317855c6#n3280
>