[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPhsuW51ND4qH4My8Uz1XaZSdvAjDR7eL7O-RLr5wKmFJA0XMQ@mail.gmail.com>
Date: Wed, 6 Mar 2024 09:13:21 -0800
From: Song Liu <song@...nel.org>
To: Linux regressions mailing list <regressions@...ts.linux.dev>
Cc: Dan Moulding <dan@...m.net>, junxiao.bi@...cle.com, gregkh@...uxfoundation.org,
linux-kernel@...r.kernel.org, linux-raid@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [REGRESSION] 6.7.1: md: raid5 hang and unresponsive system;
successfully bisected
Hi Thorsten,
On Wed, Mar 6, 2024 at 12:38 AM Linux regression tracking (Thorsten
Leemhuis) <regressions@...mhuis.info> wrote:
>
> On 02.03.24 01:05, Song Liu wrote:
> > On Fri, Mar 1, 2024 at 3:12 PM Dan Moulding <dan@...m.net> wrote:
> >>
> >>> 5. Looks like the block layer or underlying(scsi/virtio-scsi) may have
> >>> some issue which leading to the io request from md layer stayed in a
> >>> partial complete statue. I can't see how this can be related with the
> >>> commit bed9e27baf52 ("Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in
> >>> raid5d"")
> >>
> >> There is no question that the above mentioned commit makes this
> >> problem appear. While it may be that ultimately the root cause lies
> >> outside the md/raid5 code (I'm not able to make such an assessment), I
> >> can tell you that change is what turned it into a runtime
> >> regression. Prior to that change, I cannot reproduce the problem. One
> >> of my RAID-5 arrays has been running on every kernel version since
> >> 4.8, without issue. Then kernel 6.7.1 the problem appeared within
> >> hours of running the new code and affected not just one but two
> >> different machines with RAID-5 arrays. With that change reverted, the
> >> problem is not reproducible. Then when I recently upgraded to 6.8-rc5
> >> I immediately hit the problem again (because it hadn't been reverted
> >> in the mainline yet). I'm now running 6.8.0-rc5 on one of my affected
> >> machines without issue after reverting that commit on top of it.
> > [...]
> > I also tried again to reproduce the issue, but haven't got luck. While
> > I will continue try to repro the issue, I will also send the revert to 6.8
> > kernel.
>
> Is that revert on the way meanwhile? I'm asking because Linus might
> release 6.8 on Sunday.
The patch is on its way to 6.9 kernel via a PR yesterday [1]. It will land in
stable 6.8 kernel via stable backports.
Since this is not a new regression in 6.8 kernel and Dan is the only one
experiencing this, we would rather not rush last minute change to the 6.8
release.
Thanks,
Song
[1] https://lore.kernel.org/linux-raid/1C22EE73-62D9-43B0-B1A2-2D3B95F774AC@fb.com/
Powered by blists - more mailing lists