[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAH6h+hc8VdpaS2q4ya_ZfqVxWFRsKVCjN-sv73SfeyGomXvjRQ@mail.gmail.com>
Date: Tue, 14 Mar 2023 10:45:41 -0400
From: Marc Smith <msmith626@...il.com>
To: Guoqing Jiang <guoqing.jiang@...ux.dev>
Cc: Donald Buczek <buczek@...gen.mpg.de>, Song Liu <song@...nel.org>,
linux-raid@...r.kernel.org,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
it+raid@...gen.mpg.de
Subject: Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition
On Tue, Mar 14, 2023 at 9:55 AM Guoqing Jiang <guoqing.jiang@...ux.dev> wrote:
>
>
>
> On 3/14/23 21:25, Marc Smith wrote:
> > On Mon, Feb 8, 2021 at 7:49 PM Guoqing Jiang
> > <guoqing.jiang@...ud.ionos.com> wrote:
> >> Hi Donald,
> >>
> >> On 2/8/21 19:41, Donald Buczek wrote:
> >>> Dear Guoqing,
> >>>
> >>> On 08.02.21 15:53, Guoqing Jiang wrote:
> >>>>
> >>>> On 2/8/21 12:38, Donald Buczek wrote:
> >>>>>> 5. maybe don't hold reconfig_mutex when try to unregister
> >>>>>> sync_thread, like this.
> >>>>>>
> >>>>>> /* resync has finished, collect result */
> >>>>>> mddev_unlock(mddev);
> >>>>>> md_unregister_thread(&mddev->sync_thread);
> >>>>>> mddev_lock(mddev);
> >>>>> As above: While we wait for the sync thread to terminate, wouldn't it
> >>>>> be a problem, if another user space operation takes the mutex?
> >>>> I don't think other places can be blocked while hold mutex, otherwise
> >>>> these places can cause potential deadlock. Please try above two lines
> >>>> change. And perhaps others have better idea.
> >>> Yes, this works. No deadlock after >11000 seconds,
> >>>
> >>> (Time till deadlock from previous runs/seconds: 1723, 37, 434, 1265,
> >>> 3500, 1136, 109, 1892, 1060, 664, 84, 315, 12, 820 )
> >> Great. I will send a formal patch with your reported-by and tested-by.
> >>
> >> Thanks,
> >> Guoqing
> > I'm still hitting this issue with Linux 5.4.229 -- it looks like 1/2
> > of the patches that supposedly resolve this were applied to the stable
> > kernels, however, one was omitted due to a regression:
> > md: don't unregister sync_thread with reconfig_mutex held (upstream
> > commit 8b48ec23cc51a4e7c8dbaef5f34ebe67e1a80934)
> >
> > I don't see any follow-up on the thread from June 8th 2022 asking for
> > this patch to be dropped from all stable kernels since it caused a
> > regression.
> >
> > The patch doesn't appear to be present in the current mainline kernel
> > (6.3-rc2) either. So I assume this issue is still present there, or it
> > was resolved differently and I just can't find the commit/patch.
>
> It should be fixed by commit 9dfbdafda3b3"md: unlock mddev before reap
> sync_thread in action_store".
Okay, let me try applying that patch... it does not appear to be
present in my 5.4.229 kernel source. Thanks.
--Marc
>
> Thanks,
> Guoqing
Powered by blists - more mailing lists