linux-ext4 - [Bug 217965] ext4(?) regression since 6.5.0 on sata hdd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bug-217965-13602-moueCdh1e5@https.bugzilla.kernel.org/>
Date:   Wed, 04 Oct 2023 12:44:30 +0000
From:   bugzilla-daemon@...nel.org
To:     linux-ext4@...r.kernel.org
Subject: [Bug 217965] ext4(?) regression since 6.5.0 on sata hdd

https://bugzilla.kernel.org/show_bug.cgi?id=217965

--- Comment #10 from Ojaswin Mujoo (ojaswin.mujoo@....com) ---
Hi Ivan, 

So unfortunately I'm not able to replicate it yet at my end. While I try that,
wanted to check if you can give a few things a try,

So it seems like the CPU is stuck at mb_find_order_for_block() called from
mb_find_extent(). I do see a while loop in mb_find_order_for_block() but its
not obvious if its stuck there and if so why.

If possible can you:

1. Recompile the kernel with CONFIG_DEBUG_INFO=y CONFIG_SOFTLOCKUP_DETECTOR=y
and CONFIG_HARDLOCKUP_DETECTOR=y which might provide more information in dmesg
when the lockup happens.

2. Replicate it once more and note the RIP value in the trace of stuck CPU, for
example in the above trace it was mb_find_order_for_block+0x68 for CPU2.

3. Run the following kernel's source dir to get the corresponding line number
(CONFIG_DEBUG_INFO needed):

$ ./scripts/faddr2line vmlinux mb_find_order_for_block+0x68

Maybe you can share the code you see in and around those lines as well as the
exact kernel version.

This will help pinpoint the location where the code might be stuck (for example
in a loop), which can help root cause this.

Also, you mentioned that the CPU gets stuck at 100% util for 10-15mins, does it
ever come back to normal or does it stay stuck? 

Regards,
ojaswin

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.