[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200717210423.GP3151642@magnolia>
Date: Fri, 17 Jul 2020 14:04:23 -0700
From: "Darrick J. Wong" <darrick.wong@...cle.com>
To: Eric Biggers <ebiggers@...nel.org>
Cc: linux-fsdevel@...r.kernel.org, linux-xfs@...r.kernel.org,
linux-ext4@...r.kernel.org
Subject: Re: [PATCH v2] fs/direct-io: fix one-time init of ->s_dio_done_wq
On Thu, Jul 16, 2020 at 10:05:10PM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@...gle.com>
>
> Correctly implement the "one-time" init pattern for ->s_dio_done_wq.
> This fixes the following issues:
>
> - The LKMM doesn't guarantee that the workqueue will be seen initialized
> before being used, if another CPU allocated it. With regards to
> specific CPU architectures, this is true on at least Alpha, but it may
> be true on other architectures too if the internal implementation of
> workqueues causes use of the workqueue to involve a control
> dependency. (There doesn't appear to be a control dependency
> currently, but it's hard to tell and it could change in the future.)
>
> - The preliminary checks for sb->s_dio_done_wq are a data race, since
> they do a plain load of a concurrently modified variable. According
> to the C standard, this undefined behavior. In practice, the kernel
> does sometimes makes assumptions about data races might be okay in
> practice, but these rules are undocumented and not uniformly agreed
> upon, so it's best to avoid cases where they might come into play.
>
> Following the guidance for one-time init I've proposed at
> https://lkml.kernel.org/r/20200717044427.68747-1-ebiggers@kernel.org,
It might be a good idea to combine these two patches into a series so
that we can leave a breadcrumb in sb_init_dio_done_wq explaining why it
does what it does.
> replace it with the simplest implementation that is guaranteed to be
> correct while still achieving the following properties:
>
> - Doesn't make direct I/O users contend on a mutex in the fast path.
>
> - Doesn't allocate the workqueue when it will never be used.
>
> Fixes: 7b7a8665edd8 ("direct-io: Implement generic deferred AIO completions")
> Signed-off-by: Eric Biggers <ebiggers@...gle.com>
> ---
>
> v2: new implementation using smp_load_acquire() + smp_store_release()
> and a mutex.
>
> fs/direct-io.c | 42 ++++++++++++++++++++++++------------------
> fs/iomap/direct-io.c | 3 +--
> 2 files changed, 25 insertions(+), 20 deletions(-)
>
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index 6d5370eac2a8..c03c2204aadf 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -592,20 +592,28 @@ static inline int dio_bio_reap(struct dio *dio, struct dio_submit *sdio)
> */
> int sb_init_dio_done_wq(struct super_block *sb)
> {
> - struct workqueue_struct *old;
> - struct workqueue_struct *wq = alloc_workqueue("dio/%s",
> - WQ_MEM_RECLAIM, 0,
> - sb->s_id);
> - if (!wq)
> - return -ENOMEM;
> - /*
> - * This has to be atomic as more DIOs can race to create the workqueue
> - */
> - old = cmpxchg(&sb->s_dio_done_wq, NULL, wq);
> - /* Someone created workqueue before us? Free ours... */
> - if (old)
> - destroy_workqueue(wq);
> - return 0;
> + static DEFINE_MUTEX(sb_init_dio_done_mutex);
> + struct workqueue_struct *wq;
> + int err = 0;
> +
> + /* Pairs with the smp_store_release() below */
> + if (smp_load_acquire(&sb->s_dio_done_wq))
> + return 0;
> +
> + mutex_lock(&sb_init_dio_done_mutex);
> + if (sb->s_dio_done_wq)
> + goto out;
> +
> + wq = alloc_workqueue("dio/%s", WQ_MEM_RECLAIM, 0, sb->s_id);
> + if (!wq) {
> + err = -ENOMEM;
> + goto out;
> + }
> + /* Pairs with the smp_load_acquire() above */
> + smp_store_release(&sb->s_dio_done_wq, wq);
Why not use cmpxchg_release here? Is the mutex actually required here,
or is this merely following the "don't complicate it up" guidelines in
the "One-Time Init" recipe that say not to use cmpxchg_release unless
you have a strong justification for it?
The code changes look ok to me, fwiw.
--D
> +out:
> + mutex_unlock(&sb_init_dio_done_mutex);
> + return err;
> }
>
> static int dio_set_defer_completion(struct dio *dio)
> @@ -615,9 +623,7 @@ static int dio_set_defer_completion(struct dio *dio)
> if (dio->defer_completion)
> return 0;
> dio->defer_completion = true;
> - if (!sb->s_dio_done_wq)
> - return sb_init_dio_done_wq(sb);
> - return 0;
> + return sb_init_dio_done_wq(sb);
> }
>
> /*
> @@ -1250,7 +1256,7 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
> retval = 0;
> if (iocb->ki_flags & IOCB_DSYNC)
> retval = dio_set_defer_completion(dio);
> - else if (!dio->inode->i_sb->s_dio_done_wq) {
> + else {
> /*
> * In case of AIO write racing with buffered read we
> * need to defer completion. We can't decide this now,
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index ec7b78e6feca..dc7fe898dab8 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -487,8 +487,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> dio_warn_stale_pagecache(iocb->ki_filp);
> ret = 0;
>
> - if (iov_iter_rw(iter) == WRITE && !wait_for_completion &&
> - !inode->i_sb->s_dio_done_wq) {
> + if (iov_iter_rw(iter) == WRITE && !wait_for_completion) {
> ret = sb_init_dio_done_wq(inode->i_sb);
> if (ret < 0)
> goto out_free_dio;
> --
> 2.27.0
>
Powered by blists - more mailing lists