lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 11 Oct 2023 07:53:39 +0000
From:   bugzilla-daemon@...nel.org
To:     linux-ext4@...r.kernel.org
Subject: [Bug 217965] ext4(?) regression since 6.5.0 on sata hdd

https://bugzilla.kernel.org/show_bug.cgi?id=217965

--- Comment #12 from Ojaswin Mujoo (ojaswin.mujoo@....com) ---
Hey Ivan, 

so I used the kernel v6.6-rc1 and the same config you provided as well as well
as mounted an hdd on my VM. Then I followed the steps here build openwrt [1].
However, I'm still unable to replicate the 100% cpu utilization in a
kworker/flush thread (I do get .

Since you have the config options enabled and we didn't see them trigger any
warning and the fact that we get back to normal after a few minutes indicates
that its not a lockup/deadlock. We also see that on faster SSD we don't see
this issue so this might even have something to do with a lot of IOs being
queued up on the slower disk causing us to notice the delay. Maybe we are
waiting a lot more on some spinlock that can explain the CPU utilization.

Since I'm unable to replicate it, I'll have to request you for some more info
to get to the bottom of this. More specifically, can you kindly provide the
following:

For the kernel with this issue: 

1. Replicate the 100% util in one terminal window.
2. Once the 100% util is hit, in another terminal run the following command:

$ iostat -x /dev/<dev> 2  (run this for 20 to 30 seconds)  
$ perf record -ag sleep 20
$ echo l > /proc/sysrq_trigger
$ uname -a

3. Repeat the above for a kernel where the issue is not seen. 

Kindly share the sysrq back trace, iostat output, perf.data and the uname
output  for both the runs here so that I can take a closer look at what is
causing the unusual utilization.

[1] https://github.com/openwrt/openwrt#quickstart


Regards,
ojaswin

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Powered by blists - more mailing lists