lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bug-217965-13602-moueCdh1e5@https.bugzilla.kernel.org/>
Date:   Wed, 04 Oct 2023 12:44:30 +0000
From:   bugzilla-daemon@...nel.org
To:     linux-ext4@...r.kernel.org
Subject: [Bug 217965] ext4(?) regression since 6.5.0 on sata hdd

https://bugzilla.kernel.org/show_bug.cgi?id=217965

--- Comment #10 from Ojaswin Mujoo (ojaswin.mujoo@....com) ---
Hi Ivan, 

So unfortunately I'm not able to replicate it yet at my end. While I try that,
wanted to check if you can give a few things a try,

So it seems like the CPU is stuck at mb_find_order_for_block() called from
mb_find_extent(). I do see a while loop in mb_find_order_for_block() but its
not obvious if its stuck there and if so why.

If possible can you:

1. Recompile the kernel with CONFIG_DEBUG_INFO=y CONFIG_SOFTLOCKUP_DETECTOR=y
and CONFIG_HARDLOCKUP_DETECTOR=y which might provide more information in dmesg
when the lockup happens.

2. Replicate it once more and note the RIP value in the trace of stuck CPU, for
example in the above trace it was mb_find_order_for_block+0x68 for CPU2.

3. Run the following kernel's source dir to get the corresponding line number
(CONFIG_DEBUG_INFO needed):

$ ./scripts/faddr2line vmlinux mb_find_order_for_block+0x68

Maybe you can share the code you see in and around those lines as well as the
exact kernel version.

This will help pinpoint the location where the code might be stuck (for example
in a loop), which can help root cause this.

Also, you mentioned that the CPU gets stuck at 100% util for 10-15mins, does it
ever come back to normal or does it stay stuck? 

Regards,
ojaswin

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ