linux-kernel - Re: [PATCH -next v2] md: synchronize flush io with array reconfiguration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ac4470c6-f9a4-ba63-63d7-69b56ef92cc7@huaweicloud.com>
Date:   Tue, 28 Nov 2023 10:12:26 +0800
From:   Yu Kuai <yukuai1@...weicloud.com>
To:     Song Liu <song@...nel.org>, Yu Kuai <yukuai1@...weicloud.com>
Cc:     maan@...temlinux.org, neilb@...e.de, linux-raid@...r.kernel.org,
        linux-kernel@...r.kernel.org, yi.zhang@...wei.com,
        yangerkun@...wei.com, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH -next v2] md: synchronize flush io with array
 reconfiguration

Hi,

在 2023/11/28 7:32, Song Liu 写道:
> On Mon, Nov 27, 2023 at 2:16 PM Song Liu <song@...nel.org> wrote:
>>
>> On Fri, Nov 24, 2023 at 10:54 PM Yu Kuai <yukuai1@...weicloud.com> wrote:
>>>
>>> From: Yu Kuai <yukuai3@...wei.com>
>>>
>>> Currently rcu is used to protect iterating rdev from submit_flushes():
>>>
>>> submit_flushes                  remove_and_add_spares
>>>                                  synchronize_rcu
>>>                                  pers->hot_remove_disk()
>>>   rcu_read_lock()
>>>   rdev_for_each_rcu
>>>    if (rdev->raid_disk >= 0)
>>>                                  rdev->radi_disk = -1;
>>>     atomic_inc(&rdev->nr_pending)
>>>     rcu_read_unlock()
>>>     bi = bio_alloc_bioset()
>>>     bi->bi_end_io = md_end_flush
>>>     bi->private = rdev
>>>     submit_bio
>>>     // issue io for removed rdev
>>>
>>> Fix this problem by grabbing 'acive_io' before iterating rdev, make sure
>>> that remove_and_add_spares() won't concurrent with submit_flushes().
>>>
>>> Fixes: a2826aa92e2e ("md: support barrier requests on all personalities.")
>>> Signed-off-by: Yu Kuai <yukuai3@...wei.com>
>>> ---
>>> Changes v2:
>>>   - Add WARN_ON in case md_flush_request() is not called from
>>>   md_handle_request() in future.
>>>
>>>   drivers/md/md.c | 22 ++++++++++++++++------
>>>   1 file changed, 16 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>> index 86efc9c2ae56..2ffedc39edd6 100644
>>> --- a/drivers/md/md.c
>>> +++ b/drivers/md/md.c
>>> @@ -538,6 +538,9 @@ static void md_end_flush(struct bio *bio)
>>>          rdev_dec_pending(rdev, mddev);
>>>
>>>          if (atomic_dec_and_test(&mddev->flush_pending)) {
>>> +               /* The pair is percpu_ref_tryget() from md_flush_request() */
>>> +               percpu_ref_put(&mddev->active_io);
>>> +
>>>                  /* The pre-request flush has finished */
>>>                  queue_work(md_wq, &mddev->flush_work);
>>>          }
>>> @@ -557,12 +560,8 @@ static void submit_flushes(struct work_struct *ws)
>>>          rdev_for_each_rcu(rdev, mddev)
>>>                  if (rdev->raid_disk >= 0 &&
>>>                      !test_bit(Faulty, &rdev->flags)) {
>>> -                       /* Take two references, one is dropped
>>> -                        * when request finishes, one after
>>> -                        * we reclaim rcu_read_lock
>>> -                        */
>>>                          struct bio *bi;
>>> -                       atomic_inc(&rdev->nr_pending);
>>> +
>>>                          atomic_inc(&rdev->nr_pending);
>>>                          rcu_read_unlock();
>>>                          bi = bio_alloc_bioset(rdev->bdev, 0,
>>> @@ -573,7 +572,6 @@ static void submit_flushes(struct work_struct *ws)
>>>                          atomic_inc(&mddev->flush_pending);
>>>                          submit_bio(bi);
>>>                          rcu_read_lock();
>>> -                       rdev_dec_pending(rdev, mddev);
>>>                  }
>>>          rcu_read_unlock();
>>>          if (atomic_dec_and_test(&mddev->flush_pending))
>>> @@ -626,6 +624,18 @@ bool md_flush_request(struct mddev *mddev, struct bio *bio)
>>>          /* new request after previous flush is completed */
>>>          if (ktime_after(req_start, mddev->prev_flush_start)) {
>>>                  WARN_ON(mddev->flush_bio);
>>> +               /*
>>> +                * Grab a reference to make sure mddev_suspend() will wait for
>>> +                * this flush to be done.
>>> +                *
>>> +                * md_flush_reqeust() is called under md_handle_request() and
>>> +                * 'active_io' is already grabbed, hence percpu_ref_tryget()
>>> +                * won't fail, percpu_ref_tryget_live() can't be used because
>>> +                * percpu_ref_kill() can be called by mddev_suspend()
>>> +                * concurrently.
>>> +                */
>>> +               if (WARN_ON(percpu_ref_tryget(&mddev->active_io)))
>>
>> This should be "if (!WARN_ON(..))", right?

Sorry for the mistake, this actually should be:

if (WARN_ON(!percpu_ref_tryget(...))
>>
>> Song
>>
>>> +                       percpu_ref_get(&mddev->active_io);
> 
> Actually, we can just use percpu_ref_get(), no?

Yes, we can, but if someone else doesn't call md_flush_request() under
md_handle_request() in the fulture, there will be problem and
percpu_ref_get() can't catch this, do you think it'll make sense to
prevent such case?

Thanks,
Kuai

> 
> Thanks,
> Song
> .
>