lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <75f056d3-fa52-e399-29cc-4388cae069ff@I-love.SAKURA.ne.jp>
Date:   Wed, 4 Jan 2023 22:47:16 +0900
From:   Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:     agruenba@...hat.com, cluster-devel@...hat.com, rpeterso@...hat.com
Cc:     Hillf Danton <hdanton@...a.com>, Waiman Long <longman@...hat.com>,
        syzbot <syzbot+f51cb4b9afbd87ec06f2@...kaller.appspotmail.com>,
        syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] INFO: task hung in freeze_super (3)

syzbot is reporting hung task at freeze_super() after emitting
"gfs2: fsid=loop0.0: can't make FS RW: -5" message due to gfs2_find_jhead()
 from gfs2_make_fs_rw() from gfs2_fill_super() failing with -EIO.

When hung task message is printed, I can observe that glock_workqueue is
trying to hold type->s_umount_key from glock_workqueue work function
whereas the reproducer is trying to flush glock_workqueue workqueue with
type->s_umount_key held; leading to silent deadlock.

[  259.867348] task:kworker/3:1H    state:D stack:13736 pid:2497  ppid:2      flags:0x00004000
[  259.870666] Workqueue: glock_workqueue glock_work_func
[  259.872886] Call Trace:
[  259.874279]  <TASK>
[  259.875452]  __schedule+0x498/0x590
[  259.877255]  schedule+0x55/0x90
[  259.878932]  rwsem_down_write_slowpath+0x385/0x760
[  259.881206]  freeze_super+0x29/0x1f0
[  259.883022]  freeze_go_sync+0xa6/0x1f0
[  259.884905]  do_xmote+0x1ae/0xa60
[  259.886625]  glock_work_func+0x19a/0x220
[  259.888559]  process_one_work+0x306/0x530
[  259.890548]  worker_thread+0x357/0x630
[  259.892443]  kthread+0x133/0x170
[  259.894430]  ? rcu_lock_release+0x30/0x30
[  259.896408]  ? kthread_blkcg+0x60/0x60
[  259.898289]  ret_from_fork+0x1f/0x30
[  259.900114]  </TASK>
[  259.901407] 3 locks held by kworker/3:1H/2497:
[  259.903534]  #0: ffff88810653c338 ((wq_completion)glock_workqueue){+.+.}-{0:0}, at: process_one_work+0x2a7/0x530
[  259.907874]  #1: ffffc900013dbe58 ((work_completion)(&(&gl->gl_work)->work)){+.+.}-{0:0}, at: process_one_work+0x2d1/0x530
[  259.912529]  #2: ffff88812be0d0e0 (&type->s_umount_key#51){+.+.}-{3:3}, at: freeze_super+0x29/0x1f0

[  259.975640] task:a.out           state:D stack:12664 pid:4579  ppid:3638   flags:0x00004006
[  259.979256] Call Trace:
[  259.980634]  <TASK>
[  259.981886]  __schedule+0x498/0x590
[  259.983653]  schedule+0x55/0x90
[  259.985291]  schedule_timeout+0x9e/0x1d0
[  259.987219]  do_wait_for_common+0xff/0x180
[  259.989212]  ? console_conditional_schedule+0x40/0x40
[  259.991585]  wait_for_completion+0x4a/0x60
[  259.993573]  __flush_workqueue+0x360/0x820
[  259.995584]  gfs2_gl_hash_clear+0x58/0x1b0
[  259.997579]  ? _raw_spin_unlock_irqrestore+0x43/0xb0
[  259.999883]  ? lockdep_hardirqs_on+0x99/0x140
[  260.001970]  gfs2_fill_super+0xe24/0x1110
[  260.003934]  ? gfs2_reconfigure+0x4d0/0x4d0
[  260.005955]  get_tree_bdev+0x228/0x2f0
[  260.007821]  gfs2_get_tree+0x2b/0xe0
[  260.009620]  vfs_get_tree+0x30/0xe0
[  260.011570]  do_new_mount+0x1d7/0x540
[  260.013408]  path_mount+0x3c5/0xb50
[  260.015176]  __se_sys_mount+0x298/0x2f0
[  260.017073]  do_syscall_64+0x41/0x90
[  260.018870]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  260.021262] RIP: 0033:0x7fc273f26eae
[  260.023077] RSP: 002b:00007ffffebc6f38 EFLAGS: 00000282 ORIG_RAX: 00000000000000a5
[  260.026419] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc273f26eae
[  260.029573] RDX: 00000000200124c0 RSI: 0000000020012500 RDI: 00007ffffebc6fb0
[  260.032728] RBP: 00007ffffebc7100 R08: 00007ffffebc6ff0 R09: 0000000000000000
[  260.035903] R10: 0000000000000000 R11: 0000000000000282 R12: 00007ffffebc7278
[  260.039063] R13: 0000559922083246 R14: 0000559922097cb8 R15: 00007fc27413b040
[  260.042241]  </TASK>
[  260.043522] 1 lock held by a.out/4579:
[  260.045379]  #0: ffff88812be0d0e0 (&type->s_umount_key#50/1){+.+.}-{3:3}, at: alloc_super+0x102/0x450

It is difficult to reproduce lockdep warning because this is timing dependent.
So far I was able to reproduce lockdep warning only once.

[  111.928183] [ T4537] gfs2: fsid=loop0.0: can't make FS RW: -5
[  111.931578] [   T88]
[  111.937591] [   T88] ======================================================
[  111.940620] [   T88] WARNING: possible circular locking dependency detected
[  111.943553] [   T88] 6.2.0-rc1-next-20221226-00002-gc99a3711d8e3-dirty #28 Not tainted
[  111.946879] [   T88] ------------------------------------------------------
[  111.949832] [   T88] kworker/2:1H/88 is trying to acquire lock:
[  111.952383] [   T88] ffff888128ff50e0 (&type->s_umount_key#51){+.+.}-{3:3}, at: freeze_super+0x29/0x1f0
[  111.956324] [   T88]
[  111.956324] [   T88] but task is already holding lock:
[  111.959406] [   T88] ffffc90000bfbe58 ((work_completion)(&(&gl->gl_work)->work)){+.+.}-{0:0}, at: process_one_work+0x2d1/0x530
[  111.964081] [   T88]
[  111.964081] [   T88] which lock already depends on the new lock.
[  111.964081] [   T88]
[  111.968219] [   T88]
[  111.968219] [   T88] the existing dependency chain (in reverse order) is:
[  111.971959] [   T88]
[  111.971959] [   T88] -> #2 ((work_completion)(&(&gl->gl_work)->work)){+.+.}-{0:0}:
[  111.976006] [   T88]        process_one_work+0x2f3/0x530
[  111.978307] [   T88]        worker_thread+0x357/0x630
[  111.980510] [   T88]        kthread+0x133/0x170
[  111.982518] [   T88]        ret_from_fork+0x1f/0x30
[  111.984621] [   T88]
[  111.984621] [   T88] -> #1 ((wq_completion)glock_workqueue){+.+.}-{0:0}:
[  111.987992] [   T88]        __flush_workqueue+0x107/0x820
[  111.990137] [   T88]        gfs2_gl_hash_clear+0x3d/0x1b0
[  111.992431] [   T88]        gfs2_put_super+0x3a2/0x3f0
[  111.994399] [   T88]        generic_shutdown_super+0x9a/0x1c0
[  111.996554] [   T88]        kill_block_super+0x2d/0x70
[  111.999015] [   T88]        deactivate_locked_super+0x5e/0xb0
[  112.001467] [   T88]        cleanup_mnt+0x1c9/0x220
[  112.003591] [   T88]        task_work_run+0xe4/0x130
[  112.005749] [   T88]        exit_to_user_mode_loop+0x123/0x150
[  112.008224] [   T88]        exit_to_user_mode_prepare+0xaa/0x130
[  112.010765] [   T88]        syscall_exit_to_user_mode+0x32/0x70
[  112.013457] [   T88]        do_syscall_64+0x4d/0x90
[  112.015811] [   T88]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  112.018497] [   T88]
[  112.018497] [   T88] -> #0 (&type->s_umount_key#51){+.+.}-{3:3}:
[  112.021921] [   T88]        __lock_acquire+0x1712/0x33b0
[  112.024223] [   T88]        lock_acquire+0xd3/0x200
[  112.026342] [   T88]        down_write+0x3b/0xd0
[  112.028384] [   T88]        freeze_super+0x29/0x1f0
[  112.030503] [   T88]        freeze_go_sync+0xa6/0x1f0
[  112.032651] [   T88]        do_xmote+0x1ae/0xa60
[  112.034420] [   T88]        glock_work_func+0x19a/0x220
[  112.036411] [   T88]        process_one_work+0x306/0x530
[  112.038635] [   T88]        worker_thread+0x357/0x630
[  112.040702] [   T88]        kthread+0x133/0x170
[  112.042509] [   T88]        ret_from_fork+0x1f/0x30
[  112.044388] [   T88]
[  112.044388] [   T88] other info that might help us debug this:
[  112.044388] [   T88]
[  112.048102] [   T88] Chain exists of:
[  112.048102] [   T88]   &type->s_umount_key#51 --> (wq_completion)glock_workqueue --> (work_completion)(&(&gl->gl_work)->work)
[  112.048102] [   T88]
[  112.054316] [   T88]  Possible unsafe locking scenario:
[  112.054316] [   T88]
[  112.057425] [   T88]        CPU0                    CPU1
[  112.059722] [   T88]        ----                    ----
[  112.062002] [   T88]   lock((work_completion)(&(&gl->gl_work)->work));
[  112.064732] [   T88]                                lock((wq_completion)glock_workqueue);
[  112.068131] [   T88]                                lock((work_completion)(&(&gl->gl_work)->work));
[  112.071852] [   T88]   lock(&type->s_umount_key#51);
[  112.074014] [   T88]
[  112.074014] [   T88]  *** DEADLOCK ***
[  112.074014] [   T88]
[  112.077358] [   T88] 2 locks held by kworker/2:1H/88:
[  112.079541] [   T88]  #0: ffff88812530f538 ((wq_completion)glock_workqueue){+.+.}-{0:0}, at: process_one_work+0x2a7/0x530
[  112.083975] [   T88]  #1: ffffc90000bfbe58 ((work_completion)(&(&gl->gl_work)->work)){+.+.}-{0:0}, at: process_one_work+0x2d1/0x530
[  112.088673] [   T88]
[  112.088673] [   T88] stack backtrace:
[  112.091159] [   T88] CPU: 2 PID: 88 Comm: kworker/2:1H Not tainted 6.2.0-rc1-next-20221226-00002-gc99a3711d8e3-dirty #28
[  112.095547] [   T88] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  112.099240] [   T88] Workqueue: glock_workqueue glock_work_func
[  112.101745] [   T88] Call Trace:
[  112.103257] [   T88]  <TASK>
[  112.104640] [   T88]  dump_stack_lvl+0xfe/0x190
[  112.106676] [   T88]  check_noncircular+0x12e/0x140
[  112.112220] [   T88]  __lock_acquire+0x1712/0x33b0
[  112.114390] [   T88]  ? __lock_acquire+0x65f/0x33b0
[  112.116857] [   T88]  lock_acquire+0xd3/0x200
[  112.118861] [   T88]  ? freeze_super+0x29/0x1f0
[  112.124130] [   T88]  down_write+0x3b/0xd0
[  112.127778] [   T88]  ? freeze_super+0x29/0x1f0
[  112.129819] [   T88]  freeze_super+0x29/0x1f0
[  112.134526] [   T88]  freeze_go_sync+0xa6/0x1f0
[  112.136585] [   T88]  do_xmote+0x1ae/0xa60
[  112.138651] [   T88]  glock_work_func+0x19a/0x220
[  112.140970] [   T88]  process_one_work+0x306/0x530
[  112.143112] [   T88]  worker_thread+0x357/0x630
[  112.145129] [   T88]  kthread+0x133/0x170
[  112.146954] [   T88]  ? rcu_lock_release+0x30/0x30
[  112.149534] [   T88]  ? kthread_blkcg+0x60/0x60
[  112.151495] [   T88]  ret_from_fork+0x1f/0x30
[  112.153850] [   T88]  </TASK>
[  256.043767] [   T59] INFO: task kworker/2:1H:88 blocked for more than 141 seconds.
[  256.049591] [   T59]       Not tainted 6.2.0-rc1-next-20221226-00002-gc99a3711d8e3-dirty #28
[  256.053325] [   T59] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  256.056783] [   T59] task:kworker/2:1H    state:D stack:12792 pid:88    ppid:2      flags:0x00004000
[  256.060412] [   T59] Workqueue: glock_workqueue glock_work_func
[  256.062783] [   T59] Call Trace:
[  256.064008] [   T59]  <TASK>
[  256.065150] [   T59]  __schedule+0x498/0x590
[  256.066916] [   T59]  schedule+0x55/0x90
[  256.068647] [   T59]  rwsem_down_write_slowpath+0x385/0x760
[  256.072030] [   T59]  freeze_super+0x29/0x1f0
[  256.073874] [   T59]  freeze_go_sync+0xa6/0x1f0
[  256.075760] [   T59]  do_xmote+0x1ae/0xa60
[  256.077500] [   T59]  glock_work_func+0x19a/0x220
[  256.079462] [   T59]  process_one_work+0x306/0x530
[  256.081727] [   T59]  worker_thread+0x357/0x630
[  256.083619] [   T59]  kthread+0x133/0x170
[  256.085371] [   T59]  ? rcu_lock_release+0x30/0x30
[  256.087391] [   T59]  ? kthread_blkcg+0x60/0x60
[  256.089277] [   T59]  ret_from_fork+0x1f/0x30
[  256.093784] [   T59]  </TASK>
[  256.095131] [   T59] INFO: task a.out:4537 blocked for more than 141 seconds.
[  256.098069] [   T59]       Not tainted 6.2.0-rc1-next-20221226-00002-gc99a3711d8e3-dirty #28
[  256.101464] [   T59] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  256.104916] [   T59] task:a.out           state:D stack:12664 pid:4537  ppid:4531   flags:0x00004006
[  256.108532] [   T59] Call Trace:
[  256.109916] [   T59]  <TASK>
[  256.111161] [   T59]  __schedule+0x498/0x590
[  256.112959] [   T59]  schedule+0x55/0x90
[  256.114589] [   T59]  schedule_timeout+0x9e/0x1d0
[  256.116519] [   T59]  do_wait_for_common+0xff/0x180
[  256.118515] [   T59]  ? console_conditional_schedule+0x40/0x40
[  256.120860] [   T59]  wait_for_completion+0x4a/0x60
[  256.122860] [   T59]  __flush_workqueue+0x360/0x820
[  256.124855] [   T59]  gfs2_gl_hash_clear+0x58/0x1b0
[  256.126868] [   T59]  ? _raw_spin_unlock_irqrestore+0x43/0xb0
[  256.129180] [   T59]  ? lockdep_hardirqs_on+0x99/0x140
[  256.131264] [   T59]  gfs2_fill_super+0xe24/0x1110
[  256.133222] [   T59]  ? gfs2_reconfigure+0x4d0/0x4d0
[  256.135239] [   T59]  get_tree_bdev+0x228/0x2f0
[  256.137296] [   T59]  gfs2_get_tree+0x2b/0xe0
[  256.139122] [   T59]  vfs_get_tree+0x30/0xe0
[  256.140879] [   T59]  do_new_mount+0x1d7/0x540
[  256.142707] [   T59]  path_mount+0x3c5/0xb50
[  256.144461] [   T59]  __se_sys_mount+0x298/0x2f0
[  256.146345] [   T59]  do_syscall_64+0x41/0x90
[  256.148148] [   T59]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  256.150484] [   T59] RIP: 0033:0x7faf87b26eae
[  256.152285] [   T59] RSP: 002b:00007ffdb0d40f68 EFLAGS: 00000282 ORIG_RAX: 00000000000000a5
[  256.155594] [   T59] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007faf87b26eae
[  256.160847] [   T59] RDX: 00000000200124c0 RSI: 0000000020012500 RDI: 00007ffdb0d40fe0
[  256.164184] [   T59] RBP: 00007ffdb0d41130 R08: 00007ffdb0d41020 R09: 0000000000000000
[  256.167404] [   T59] R10: 0000000000000000 R11: 0000000000000282 R12: 00007ffdb0d412a8
[  256.170913] [   T59] R13: 00005574b9254246 R14: 00005574b9268cb8 R15: 00007faf87da3040
[  256.174771] [   T59]  </TASK>
[  256.176029] [   T59] INFO: lockdep is turned off.
[  256.178522] [   T59] NMI backtrace for cpu 7
[  256.180317] [   T59] CPU: 7 PID: 59 Comm: khungtaskd Not tainted 6.2.0-rc1-next-20221226-00002-gc99a3711d8e3-dirty #28
[  256.184614] [   T59] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  256.188220] [   T59] Call Trace:
[  256.189626] [   T59]  <TASK>
[  256.190893] [   T59]  dump_stack_lvl+0xfe/0x190
[  256.194767] [   T59]  nmi_cpu_backtrace+0x25d/0x2c0
[  256.196804] [   T59]  ? arch_trigger_cpumask_backtrace+0x10/0x10
[  256.199337] [   T59]  nmi_trigger_cpumask_backtrace+0x11f/0x250
[  256.202150] [   T59]  watchdog+0x8a5/0x8e0
[  256.203918] [   T59]  kthread+0x133/0x170
[  256.205681] [   T59]  ? hungtask_pm_notify+0x90/0x90
[  256.207783] [   T59]  ? kthread_blkcg+0x60/0x60
[  256.209689] [   T59]  ret_from_fork+0x1f/0x30
[  256.211690] [   T59]  </TASK>
[  256.213052] [   T59] Sending NMI from CPU 7 to CPUs 0-6:
[  256.215232] [    C4] NMI backtrace for cpu 4 skipped: idling at default_idle+0x13/0x20
[  256.218391] [    C6] NMI backtrace for cpu 6 skipped: idling at default_idle+0x13/0x20
[  256.218378] [    C4] INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 3.145 msecs
[  256.218378] [    C2] NMI backtrace for cpu 2 skipped: idling at default_idle+0x13/0x20
[  256.218766] [    C0] NMI backtrace for cpu 0 skipped: idling at default_idle+0x13/0x20
[  256.218889] [    C1] NMI backtrace for cpu 1 skipped: idling at default_idle+0x13/0x20
[  256.219085] [    C5] NMI backtrace for cpu 5 skipped: idling at default_idle+0x13/0x20
[  256.219230] [    C3] NMI backtrace for cpu 3 skipped: idling at default_idle+0x13/0x20

I suspect that cleanup is not done appropriately when gfs2_find_jhead() failed.
Looking into gfs2_make_fs_rw(), I see there are two worrisome things.

One is that gfs2_make_fs_rw() returns an error without calling gfs2_consist(sdp) when
gfs2_find_jhead() returned an error whereas gfs2_make_fs_rw() returns -EIO after calling
gfs2_consist(sdp) when head.lh_flags does not contain GFS2_LOG_HEAD_UNMOUNT flag.

Since head.lh_flags is assigned by gfs2_find_jhead(), we might want to call gfs2_consist(sdp)
when gfs2_find_jhead() returned an error. And actually

--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -138,7 +138,11 @@ int gfs2_make_fs_rw(struct gfs2_sbd *sdp)
                return -EIO;

        error = gfs2_find_jhead(sdp->sd_jdesc, &head, false);
-       if (error || gfs2_withdrawn(sdp))
+       if (error) {
+               gfs2_consist(sdp);
+               return error;
+       }
+       if (gfs2_withdrawn(sdp))
                return error;

        if (!(head.lh_flags & GFS2_LOG_HEAD_UNMOUNT)) {

avoids this deadlock. But I don't know when/how to use gfs2_withdraw().
Please check if this change is appropriate.

The other worrisome thing is that gfs2_make_fs_rw() is returning 0 (and
mount operation will continue) when gfs2_withdrawn() == true. Can the caller
of gfs2_make_fs_rw() survive when callgfs2_make_fs_rw() returned 0 without
processing

        /*  Initialize some head of the log stuff  */
        sdp->sd_log_sequence = head.lh_sequence + 1;
        gfs2_log_pointers_init(sdp, head.lh_blkno);

lines? Shouldn't the caller of gfs2_make_fs_rw() observe an error when
gfs2_make_fs_rw() did not execute the

	set_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags);

line due to gfs2_withdrawn() == true?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ