linux-kernel - Re: [REGRESSION] 6.7.1: md: raid5 hang and unresponsive system; successfully bisected

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <86a8e266-110e-49cb-a156-faa950df3a62@oracle.com>
Date: Wed, 31 Jan 2024 09:37:23 -0800
From: junxiao.bi@...cle.com
To: Dan Moulding <dan@...m.net>
Cc: gregkh@...uxfoundation.org, linux-kernel@...r.kernel.org,
        linux-raid@...r.kernel.org, regressions@...ts.linux.dev,
        song@...nel.org, stable@...r.kernel.org, yukuai1@...weicloud.com
Subject: Re: [REGRESSION] 6.7.1: md: raid5 hang and unresponsive system;
 successfully bisected

Hi Dan,

On 1/25/24 12:31 PM, Dan Moulding wrote:
> On this Fedora 39 VM, I created a 1GiB LVM volume to use as the RAID-5
> journal from space on the "boot" disk. Then I attached 3 additional
> 100 GiB virtual disks and created the RAID-5 from those 3 disks and
> the write-journal device. I then created a new LVM volume group from
> the md0 array and created one LVM logical volume named "data", using
> all but 64GiB of the available VG space. I then created an ext4 file
> system on the "data" volume, mounted it, and used "dd" to copy 1MiB
> blocks from /dev/urandom to a file on the "data" file system, and just
> let it run. Eventually "dd" hangs and top shows that md0_raid5 is
> using 100% CPU.

I can't reproduce this issue with this test case running over night, dd 
is making progress well. I can see dd is very busy, closing to 100%, 
sometimes it stay in D status, but just for a moment. md5_raid5 is 
staying around 60%, never 100%.

I am wondering your case is a performance issue or a dead hung, if it's 
a hung, i suppose we should see some hung task call trace of dd in dmesg 
if you didn't disable kernel.hung_task_timeout_secs.

Also are you able to configure kdump and trigger a core dump when issue 
reproduced.

Thanks,

Junxiao.