[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240302165538.30761-1-dan@danm.net>
Date: Sat, 2 Mar 2024 09:55:38 -0700
From: Dan Moulding <dan@...m.net>
To: junxiao.bi@...cle.com
Cc: dan@...m.net,
gregkh@...uxfoundation.org,
linux-kernel@...r.kernel.org,
linux-raid@...r.kernel.org,
regressions@...ts.linux.dev,
song@...nel.org,
stable@...r.kernel.org,
logang@...tatee.com
Subject: Re: [REGRESSION] 6.7.1: md: raid5 hang and unresponsive system; successfully bisected
> I have not root cause this yet, but would like share some findings from
> the vmcore Dan shared. From what i can see, this doesn't look like a md
> issue, but something wrong with block layer or below.
Below is one other thing I found that might be of interest. This is
from the original email thread [1] that was linked to in the original
issue from 2022, which the change in question reverts:
On 2022-09-02 17:46, Logan Gunthorpe wrote:
> I've made some progress on this nasty bug. I've got far enough to know it's not
> related to the blk-wbt or the block layer.
>
> Turns out a bunch of bios are stuck queued in a blk_plug in the md_raid5
> thread while that thread appears to be stuck in an infinite loop (so it never
> schedules or does anything to flush the plug).
>
> I'm still debugging to try and find out the root cause of that infinite loop,
> but I just wanted to send an update that the previous place I was stuck at
> was not correct.
>
> Logan
This certainly sounds like it has some similarities to what we are
seeing when that change is reverted. The md0_raid5 thread appears to be
in an infinite loop, consuming 100% CPU, but not actually doing any
work.
-- Dan
[1] https://lore.kernel.org/r/7f3b87b6-b52a-f737-51d7-a4eec5c44112@deltatee.com
Powered by blists - more mailing lists