lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 3 Oct 2023 20:11:11 -0400
From:   Eric Whitney <enwlinux@...il.com>
To:     linux-ext4@...r.kernel.org
Cc:     jack@...e.cz, libaokun1@...wei.com
Subject: probable quota bug introduced in 6.6-rc1

When run on my test hardware, generic/270 triggers hung task timeouts when
run on a 6.6-rc1 (or -rc2, -rc3, -rc4) kernel with kvm-xfstests using the
nojournal test scenario.  The test always passes, but about 60% of the time
the running time of the test increases by an order of magnitude or more and
one or more of the hung task timeout warnings included below can be found in
the log.

This does not reproduce on 6.5.  Bisection leads to this patch:

dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should
provide")

>From the log:

generic/270 306s ...  [20:08:45][  311.322318] run fstests generic/270 at 2023-10-03 20:08:45
[  311.579641] EXT4-fs (vdc): mounted filesystem d0e542a0-3342-4d43-aa1f-c918cc92aafa r/w without journal. Quota mode: writeback.
[  311.587978] EXT4-fs (vdc): re-mounted d0e542a0-3342-4d43-aa1f-c918cc92aafa ro. Quota mode: writeback.
[  311.592725] EXT4-fs (vdc): re-mounted d0e542a0-3342-4d43-aa1f-c918cc92aafa r/w. Quota mode: writeback.
[  335.491107] 270 (3092): drop_caches: 3
[  491.167988] INFO: task quotaon:3450 blocked for more than 122 seconds.
[  491.168334]       Not tainted 6.4.0+ #13
[  491.168544] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  491.168936] task:quotaon         state:D stack:0     pid:3450  ppid:3092   flags:0x00004000
[  491.169363] Call Trace:
[  491.169503]  <TASK>
[  491.169620]  __schedule+0x394/0xd40
[  491.169813]  schedule+0x5d/0xd0
[  491.169981]  schedule_timeout+0x1a7/0x1c0
[  491.170191]  ? lock_release+0x139/0x280
[  491.170395]  ? lock_acquire+0xb9/0x180
[  491.170605]  ? do_raw_spin_unlock+0x4b/0xa0
[  491.170837]  __wait_for_common+0xb6/0x1e0
[  491.171046]  ? __pfx_schedule_timeout+0x10/0x10
[  491.171324]  __flush_work+0x2da/0x430
[  491.171517]  ? __pfx_wq_barrier_func+0x10/0x10
[  491.171747]  ? 0xffffffff81000000
[  491.171932]  dquot_disable+0x3e5/0x670
[  491.172134]  ext4_quota_off+0x50/0x1a0
[  491.172332]  __x64_sys_quotactl+0x87/0x1c0
[  491.172545]  do_syscall_64+0x38/0x90
[  491.172731]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  491.172990] RIP: 0033:0x7f3c7c79eada
[  491.173176] RSP: 002b:00007ffed2ff4478 EFLAGS: 00000246 ORIG_RAX: 00000000000000b3
[  491.173558] RAX: ffffffffffffffda RBX: 000055886a3997d0 RCX: 00007f3c7c79eada
[  491.173915] RDX: 0000000000000000 RSI: 000055886bf43de0 RDI: 0000000080000301
[  491.174271] RBP: 000055886bf43de0 R08: 0000000000000001 R09: 0000000000000002
[  491.174657] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
[  491.175014] R13: 000055886bf43ea0 R14: 0000000000000001 R15: 0000000000000000
[  491.175373]  </TASK>
[  491.175491] 
[  491.175491] Showing all locks held in the system:
[  491.176706] 1 lock held by rcu_tasks_kthre/12:
[  491.178126]  #0: ffffffff82763970 (rcu_tasks.tasks_gp_mutex){....}-{3:3}, at: rcu_tasks_one_gp+0x30/0x3f0
[  491.180955] 1 lock held by rcu_tasks_rude_/13:
[  491.182394]  #0: ffffffff827636f0 (rcu_tasks_rude.tasks_gp_mutex){....}-{3:3}, at: rcu_tasks_one_gp+0x30/0x3f0
[  491.194388] 1 lock held by khungtaskd/26:
[  491.196153]  #0: ffffffff82764020 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0xe/0x110
[  491.199676] 2 locks held by kworker/u4:4/59:
[  491.200722]  #0: ffff88800385cd38 ((wq_completion)events_unbound){....}-{0:0}, at: process_one_work+0x1f6/0x550
[  491.201600]  #1: ffffc90000513e80 ((quota_release_work).work){....}-{0:0}, at: process_one_work+0x1f6/0x550
[  491.202746] 1 lock held by quotaon/3450:
[  491.203184]  #0: ffff88800afd60e0 (&type->s_umount_key#33){....}-{3:3}, at: user_get_super+0xd3/0x100
[  491.204217] 
[  491.204373] =============================================


Eric

Powered by blists - more mailing lists