linux-kernel - Re: [PATCH v3] md: synchronize flush io with array reconfiguration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAPhsuW4P9Zd9nzVW=_4BieEAe4zdQtFEoBc9VLEnWDGTuOo+OA@mail.gmail.com>
Date:   Fri, 1 Dec 2023 16:09:42 -0800
From:   Song Liu <song@...nel.org>
To:     Yu Kuai <yukuai1@...weicloud.com>
Cc:     neilb@...e.de, maan@...temlinux.org, linux-raid@...r.kernel.org,
        linux-kernel@...r.kernel.org, yukuai3@...wei.com,
        yi.zhang@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH v3] md: synchronize flush io with array reconfiguration

On Tue, Nov 28, 2023 at 6:03 PM Yu Kuai <yukuai1@...weicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@...wei.com>
>
> Currently rcu is used to protect iterating rdev from submit_flushes():
>
> submit_flushes                  remove_and_add_spares
>                                 synchronize_rcu
>                                 pers->hot_remove_disk()
>  rcu_read_lock()
>  rdev_for_each_rcu
>   if (rdev->raid_disk >= 0)
>                                 rdev->radi_disk = -1;
>    atomic_inc(&rdev->nr_pending)
>    rcu_read_unlock()
>    bi = bio_alloc_bioset()
>    bi->bi_end_io = md_end_flush
>    bi->private = rdev
>    submit_bio
>    // issue io for removed rdev
>
> Fix this problem by grabbing 'acive_io' before iterating rdev, make sure
> that remove_and_add_spares() won't concurrent with submit_flushes().
>
> Fixes: a2826aa92e2e ("md: support barrier requests on all personalities.")
> Signed-off-by: Yu Kuai <yukuai3@...wei.com>

Applied to md-next. Thanks!

Song

> ---
> Changes in v3:
>  - use WARN_ON(percpu_ref_is_zero()) and use percpu_ref_get().
> Changes in v2:
>  - Add WARN_ON in case md_flush_request() is not called from
>  md_handle_request() in future.
>  drivers/md/md.c | 22 ++++++++++++++++------
>  1 file changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 05902e36db66..75ff96d53266 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -529,6 +529,9 @@ static void md_end_flush(struct bio *bio)
>         rdev_dec_pending(rdev, mddev);
>
>         if (atomic_dec_and_test(&mddev->flush_pending)) {
> +               /* The pair is percpu_ref_get() from md_flush_request() */
> +               percpu_ref_put(&mddev->active_io);
> +
>                 /* The pre-request flush has finished */
>                 queue_work(md_wq, &mddev->flush_work);
>         }
> @@ -548,12 +551,8 @@ static void submit_flushes(struct work_struct *ws)
>         rdev_for_each_rcu(rdev, mddev)
>                 if (rdev->raid_disk >= 0 &&
>                     !test_bit(Faulty, &rdev->flags)) {
> -                       /* Take two references, one is dropped
> -                        * when request finishes, one after
> -                        * we reclaim rcu_read_lock
> -                        */
>                         struct bio *bi;
> -                       atomic_inc(&rdev->nr_pending);
> +
>                         atomic_inc(&rdev->nr_pending);
>                         rcu_read_unlock();
>                         bi = bio_alloc_bioset(rdev->bdev, 0,
> @@ -564,7 +563,6 @@ static void submit_flushes(struct work_struct *ws)
>                         atomic_inc(&mddev->flush_pending);
>                         submit_bio(bi);
>                         rcu_read_lock();
> -                       rdev_dec_pending(rdev, mddev);
>                 }
>         rcu_read_unlock();
>         if (atomic_dec_and_test(&mddev->flush_pending))
> @@ -617,6 +615,18 @@ bool md_flush_request(struct mddev *mddev, struct bio *bio)
>         /* new request after previous flush is completed */
>         if (ktime_after(req_start, mddev->prev_flush_start)) {
>                 WARN_ON(mddev->flush_bio);
> +               /*
> +                * Grab a reference to make sure mddev_suspend() will wait for
> +                * this flush to be done.
> +                *
> +                * md_flush_reqeust() is called under md_handle_request() and
> +                * 'active_io' is already grabbed, hence percpu_ref_is_zero()
> +                * won't pass, percpu_ref_tryget_live() can't be used because
> +                * percpu_ref_kill() can be called by mddev_suspend()
> +                * concurrently.
> +                */
> +               WARN_ON(percpu_ref_is_zero(&mddev->active_io));
> +               percpu_ref_get(&mddev->active_io);
>                 mddev->flush_bio = bio;
>                 bio = NULL;
>         }
> --
> 2.39.2
>
>