linux-kernel - Re: [REGRESSION] 6.7.1: md: raid5 hang and unresponsive system; successfully bisected

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240131143640.00003296@linux.intel.com>
Date: Wed, 31 Jan 2024 14:36:40 +0100
From: Blazej Kucman <blazej.kucman@...ux.intel.com>
To: Song Liu <song@...nel.org>
Cc: Yu Kuai <yukuai1@...weicloud.com>, Dan Moulding <dan@...m.net>,
 carlos@...ica.ufpr.br, gregkh@...uxfoundation.org, junxiao.bi@...cle.com,
 linux-kernel@...r.kernel.org, linux-raid@...r.kernel.org,
 regressions@...ts.linux.dev, stable@...r.kernel.org, "yukuai (C)"
 <yukuai3@...wei.com>
Subject: Re: [REGRESSION] 6.7.1: md: raid5 hang and unresponsive system;
 successfully bisected

On Tue, 30 Jan 2024 20:55:39 -0800
Song Liu <song@...nel.org> wrote:

> On Tue, Jan 30, 2024 at 6:41 PM Yu Kuai <yukuai1@...weicloud.com>
> >
> > Can you test the following patch?
> >
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index e3a56a958b47..a8db84c200fe 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -578,8 +578,12 @@ static void submit_flushes(struct work_struct
> > *ws) rcu_read_lock();
> >                  }
> >          rcu_read_unlock();
> > -       if (atomic_dec_and_test(&mddev->flush_pending))
> > +       if (atomic_dec_and_test(&mddev->flush_pending)) {
> > +               /* The pair is percpu_ref_get() from
> > md_flush_request() */
> > +               percpu_ref_put(&mddev->active_io);
> > +
> >                  queue_work(md_wq, &mddev->flush_work);
> > +       }
> >   }
> >
> >   static void md_submit_flush_data(struct work_struct *ws)  
> 
> This fixes the issue in my tests. Please submit the official patch.
> Also, we should add a test in mdadm/tests to cover this case.
> 
> Thanks,
> Song
> 

Hi Kuai,

On my hardware issue also stopped reproducing with this fix. 

I applied the fix on current HEAD of master
branch in kernel/git/torvalds/linux.git repo.

Thansk,
Blazej