linux-ext4 - [Bug 220535] ext4 __jbd2_log_wait_for_space soft lockup and CPU stuck for 134s

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bug-220535-13602-JaLqHptZf4@https.bugzilla.kernel.org/>
Date: Fri, 19 Sep 2025 02:34:34 +0000
From: bugzilla-daemon@...nel.org
To: linux-ext4@...r.kernel.org
Subject: [Bug 220535] ext4 __jbd2_log_wait_for_space soft lockup and CPU
 stuck for 134s

https://bugzilla.kernel.org/show_bug.cgi?id=220535

--- Comment #7 from waxihus@...il.com ---
have reproduced with the latest version and untainted kernel, see attachment
for more dmesg log:
source code clone from 46a51f4f5edade43ba66b3c151f0e25ec8b69cb6
[  533.816688] INFO: task kworker/u778:1:1854 blocked for more than 481
seconds.
[  533.816713]       Not tainted 6.17.0-rc6-master-default #2
[  533.816723] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  533.816734] task:kworker/u778:1  state:D stack:0     pid:1854  tgid:1854 
ppid:2      task_flags:0x4248060 flags:0x00004000
[  533.816751] Workqueue: writeback wb_workfn (flush-259:1)
[  533.816766] Call Trace:
[  533.816773]  <TASK>
[  533.816782]  __schedule+0x462/0x1400
[  533.816793]  ? sysvec_apic_timer_interrupt+0xf/0x90
[  533.816804]  ? srso_alias_return_thunk+0x5/0xfbef5
[  533.816817]  schedule+0x27/0xd0
[  533.816825]  schedule_preempt_disabled+0x15/0x30
[  533.816834]  __mutex_lock.constprop.0+0x357/0x940
[  533.816846]  mutex_lock_io+0x41/0x50
[  533.816857]  __jbd2_log_wait_for_space+0xda/0x1f0 [jbd2
371d593b5f5403746c7713ab4dc9d5e5c1953199]
[  533.816877]  add_transaction_credits+0x2f2/0x300 [jbd2
371d593b5f5403746c7713ab4dc9d5e5c1953199]
[  533.816895]  start_this_handle+0xfe/0x520 [jbd2
371d593b5f5403746c7713ab4dc9d5e5c1953199]
[  533.816910]  ? srso_alias_return_thunk+0x5/0xfbef5
[  533.816921]  jbd2__journal_start+0xfe/0x200 [jbd2
371d593b5f5403746c7713ab4dc9d5e5c1953199]
[  533.816936]  ext4_do_writepages+0x46a/0xee0 [ext4
893473fac91f34d580e31648f305d1177dd81b63]
[  533.816968]  ? __dequeue_entity+0x3c0/0x480
[  533.816977]  ? update_load_avg+0x80/0x760
[  533.816985]  ? srso_alias_return_thunk+0x5/0xfbef5
[  533.816996]  ? ext4_writepages+0xbe/0x190 [ext4
893473fac91f34d580e31648f305d1177dd81b63]
[  533.817019]  ext4_writepages+0xbe/0x190 [ext4
893473fac91f34d580e31648f305d1177dd81b63]
[  533.817044]  do_writepages+0xc7/0x160
[  533.817055]  __writeback_single_inode+0x41/0x340
[  533.817066]  writeback_sb_inodes+0x215/0x4c0
[  533.817084]  __writeback_inodes_wb+0x4c/0xe0
[  533.817094]  wb_writeback+0x192/0x300
[  533.817105]  ? get_nr_inodes+0x3b/0x60
[  533.817116]  wb_workfn+0x38a/0x460
[  533.817126]  process_one_work+0x1a1/0x3e0
[  533.817137]  worker_thread+0x292/0x420
[  533.817147]  ? __pfx_worker_thread+0x10/0x10
[  533.817156]  kthread+0xfc/0x240
[  533.817165]  ? __pfx_kthread+0x10/0x10
[  533.817174]  ? __pfx_kthread+0x10/0x10
[  533.817182]  ret_from_fork+0x1c1/0x1f0
[  533.817192]  ? __pfx_kthread+0x10/0x10
[  533.817200]  ret_from_fork_asm+0x1a/0x30


Also have a soft lockup, but the probability is very low.
[  329.157094] watchdog: BUG: soft lockup - CPU#21 stuck for 67s!
[kworker/u513:2:795]
[  329.157169] Workqueue: writeback wb_workfn (flush-259:8)
[  329.157176] RIP: 0010:queued_read_lock_slowpath+0x52/0x130
[  329.157194] Call Trace:
[  329.157196]  <TASK>
[  329.157200]  start_this_handle+0x99/0x520 [jbd2
0a56678a235e076a07e3222376de4dc1cbec6f17]
[  329.157216]  ? finish_task_switch.isra.0+0x97/0x2c0
[  329.157220]  jbd2__journal_start+0xfe/0x200 [jbd2
0a56678a235e076a07e3222376de4dc1cbec6f17]
[  329.157226]  ext4_do_writepages+0x46a/0xee0 [ext4
bcac05fee1dc1aaf21870e1e652c064619591c71]
[  329.157273]  ? find_get_block_common+0x1a8/0x3f0
[  329.157277]  ? ext4_writepages+0xbe/0x190 [ext4
bcac05fee1dc1aaf21870e1e652c064619591c71]
[  329.157303]  ext4_writepages+0xbe/0x190 [ext4
bcac05fee1dc1aaf21870e1e652c064619591c71]
[  329.157328]  do_writepages+0xc7/0x160
[  329.157331]  __writeback_single_inode+0x41/0x340
[  329.157334]  writeback_sb_inodes+0x215/0x4c0
[  329.157339]  __writeback_inodes_wb+0x4c/0xe0
[  329.157341]  wb_writeback+0x192/0x300
[  329.157344]  ? get_nr_inodes+0x3b/0x60
[  329.157347]  wb_workfn+0x291/0x460
[  329.157350]  process_one_work+0x1a1/0x3e0
[  329.157353]  worker_thread+0x292/0x420
[  329.157356]  ? __pfx_worker_thread+0x10/0x10
[  329.157358]  kthread+0xfc/0x240
[  329.157360]  ? __pfx_kthread+0x10/0x10
[  329.157361]  ? __pfx_kthread+0x10/0x10
[  329.157362]  ret_from_fork+0x1c1/0x1f0
[  329.157365]  ? __pfx_kthread+0x10/0x10
[  329.157366]  ret_from_fork_asm+0x1a/0x30

Reproduction Steps:
Format 10 NVMe drives with XFS and run 3 concurrent 100GB file reads on each
drive.
Format 1 NVMe drive with EXT4 and run 256 concurrent operations for creating
files and folders, as well as adding and deleting xattrs (the issue can also be
reproduced with 192 concurrent operations, though the probability is lower).

cpuinfo：
# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                192
On-line CPU(s) list:   0-191
Thread(s) per core:    2
Core(s) per socket:    96
Socket(s):             1
NUMA node(s):          2
Vendor ID:             AuthenticAMD
CPU family:            25
Model:                 17
Model name:            AMD EPYC 9A14 96-Core Processor
Stepping:              1
CPU MHz:               3699.375
CPU max MHz:           3703.3760
CPU min MHz:           1500.0000
BogoMIPS:              5200.37
Virtualization:        AMD-V
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              32768K
NUMA node0 CPU(s):     0-47,96-143
NUMA node1 CPU(s):     48-95,144-191
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb
rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl xtopology nonstop_tsc cpuid
extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1
sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic
cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext
perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpuid_fault cpb cat_l3 cdp_l3
hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase
bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap
avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec
xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk
avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin cppc arat npt lbrv
svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter
pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip
pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg
avx512_vpopcntdq la57 rdpid overflow_recov succor smca fsrm flush_l1d
debug_swap

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.