linux-ext4 - [Bug 217965] ext4(?) regression since 6.5.0 on sata hdd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bug-217965-13602-YEM8MziGb7@https.bugzilla.kernel.org/>
Date:   Wed, 11 Oct 2023 07:53:39 +0000
From:   bugzilla-daemon@...nel.org
To:     linux-ext4@...r.kernel.org
Subject: [Bug 217965] ext4(?) regression since 6.5.0 on sata hdd

https://bugzilla.kernel.org/show_bug.cgi?id=217965

--- Comment #12 from Ojaswin Mujoo (ojaswin.mujoo@....com) ---
Hey Ivan, 

so I used the kernel v6.6-rc1 and the same config you provided as well as well
as mounted an hdd on my VM. Then I followed the steps here build openwrt [1].
However, I'm still unable to replicate the 100% cpu utilization in a
kworker/flush thread (I do get .

Since you have the config options enabled and we didn't see them trigger any
warning and the fact that we get back to normal after a few minutes indicates
that its not a lockup/deadlock. We also see that on faster SSD we don't see
this issue so this might even have something to do with a lot of IOs being
queued up on the slower disk causing us to notice the delay. Maybe we are
waiting a lot more on some spinlock that can explain the CPU utilization.

Since I'm unable to replicate it, I'll have to request you for some more info
to get to the bottom of this. More specifically, can you kindly provide the
following:

For the kernel with this issue: 

1. Replicate the 100% util in one terminal window.
2. Once the 100% util is hit, in another terminal run the following command:

$ iostat -x /dev/<dev> 2  (run this for 20 to 30 seconds)  
$ perf record -ag sleep 20
$ echo l > /proc/sysrq_trigger
$ uname -a

3. Repeat the above for a kernel where the issue is not seen. 

Kindly share the sysrq back trace, iostat output, perf.data and the uname
output  for both the runs here so that I can take a closer look at what is
causing the unusual utilization.

[1] https://github.com/openwrt/openwrt#quickstart

Regards,
ojaswin

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.