linux-kernel - Re: [PATCH v2 6/6] dm-raid: delay flushing event_work() after reconfig

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPhsuW5jDzOo=2NVHO17jfjmpR0LV9+Q1a9FkBA8NBshX9t=ig@mail.gmail.com>
Date:   Mon, 27 Nov 2023 16:00:25 -0800
From:   Song Liu <song@...nel.org>
To:     Yu Kuai <yukuai1@...weicloud.com>, dm-devel@...ts.linux.dev
Cc:     yukuai3@...wei.com, xni@...hat.com, linux-kernel@...r.kernel.org,
        linux-raid@...r.kernel.org, yi.zhang@...wei.com,
        yangerkun@...wei.com
Subject: Re: [PATCH v2 6/6] dm-raid: delay flushing event_work() after
 reconfig_mutex is released

+ dm-devel@...ts.linux.dev

On Fri, Nov 24, 2023 at 12:00 AM Yu Kuai <yukuai1@...weicloud.com> wrote:
>
> From: Yu Kuai <yukuai3@...wei.com>
>
> After commit db5e653d7c9f ("md: delay choosing sync action to
> md_start_sync()"), md_start_sync() will hold 'reconfig_mutex', however,
> in order to make sure event_work is done, __md_stop() will flush
> workqueue with reconfig_mutex grabbed, hence if sync_work is still
> pending, deadlock will be triggered.
>
> Fortunately, former pacthes to fix stopping sync_thread already make sure
> all sync_work is done already, hence such deadlock is not possible
> anymore. However, in order not to cause confusions for people by this
> implicit dependency, delay flushing event_work to dm-raid where
> 'reconfig_mutex' is not held, and add some comments to emphasize that
> the workqueue can't be flushed with 'reconfig_mutex'.
>
> Signed-off-by: Yu Kuai <yukuai3@...wei.com>
> ---
>  drivers/md/dm-raid.c |  3 +++
>  drivers/md/md.c      | 11 ++++++++---
>  2 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> index a4692f8f98ee..51f15c20f621 100644
> --- a/drivers/md/dm-raid.c
> +++ b/drivers/md/dm-raid.c
> @@ -3317,6 +3317,9 @@ static void raid_dtr(struct dm_target *ti)
>         mddev_lock_nointr(&rs->md);
>         md_stop(&rs->md);
>         mddev_unlock(&rs->md);
> +
> +       if (work_pending(&rs->md.event_work))
> +               flush_work(&rs->md.event_work);
>         raid_set_free(rs);
>  }
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index fb8a7d1eebee..86efc9c2ae56 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -82,6 +82,14 @@ static struct module *md_cluster_mod;
>
>  static DECLARE_WAIT_QUEUE_HEAD(resync_wait);
>  static struct workqueue_struct *md_wq;
> +
> +/*
> + * This workqueue is used for sync_work to register new sync_thread, and for
> + * del_work to remove rdev, and for event_work that is only set by dm-raid.
> + *
> + * Noted that sync_work will grab reconfig_mutex, hence never flush this
> + * workqueue whith reconfig_mutex grabbed.
> + */
>  static struct workqueue_struct *md_misc_wq;
>  struct workqueue_struct *md_bitmap_wq;
>
> @@ -6328,9 +6336,6 @@ static void __md_stop(struct mddev *mddev)
>         struct md_personality *pers = mddev->pers;
>         md_bitmap_destroy(mddev);
>         mddev_detach(mddev);
> -       /* Ensure ->event_work is done */
> -       if (mddev->event_work.func)
> -               flush_workqueue(md_misc_wq);
>         spin_lock(&mddev->lock);
>         mddev->pers = NULL;
>         spin_unlock(&mddev->lock);
> --
> 2.39.2
>