linux-ext4 - Re: [PATCH v2] fs/direct-io: fix one-time init of ->s_dio_done

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200717210423.GP3151642@magnolia>
Date:   Fri, 17 Jul 2020 14:04:23 -0700
From:   "Darrick J. Wong" <darrick.wong@...cle.com>
To:     Eric Biggers <ebiggers@...nel.org>
Cc:     linux-fsdevel@...r.kernel.org, linux-xfs@...r.kernel.org,
        linux-ext4@...r.kernel.org
Subject: Re: [PATCH v2] fs/direct-io: fix one-time init of ->s_dio_done_wq

On Thu, Jul 16, 2020 at 10:05:10PM -0700, Eric Biggers wrote:
> From: Eric Biggers <ebiggers@...gle.com>
> 
> Correctly implement the "one-time" init pattern for ->s_dio_done_wq.
> This fixes the following issues:
> 
> - The LKMM doesn't guarantee that the workqueue will be seen initialized
>   before being used, if another CPU allocated it.  With regards to
>   specific CPU architectures, this is true on at least Alpha, but it may
>   be true on other architectures too if the internal implementation of
>   workqueues causes use of the workqueue to involve a control
>   dependency.  (There doesn't appear to be a control dependency
>   currently, but it's hard to tell and it could change in the future.)
> 
> - The preliminary checks for sb->s_dio_done_wq are a data race, since
>   they do a plain load of a concurrently modified variable.  According
>   to the C standard, this undefined behavior.  In practice, the kernel
>   does sometimes makes assumptions about data races might be okay in
>   practice, but these rules are undocumented and not uniformly agreed
>   upon, so it's best to avoid cases where they might come into play.
> 
> Following the guidance for one-time init I've proposed at
> https://lkml.kernel.org/r/20200717044427.68747-1-ebiggers@kernel.org,

It might be a good idea to combine these two patches into a series so
that we can leave a breadcrumb in sb_init_dio_done_wq explaining why it
does what it does.

> replace it with the simplest implementation that is guaranteed to be
> correct while still achieving the following properties:
> 
>     - Doesn't make direct I/O users contend on a mutex in the fast path.
> 
>     - Doesn't allocate the workqueue when it will never be used.
> 
> Fixes: 7b7a8665edd8 ("direct-io: Implement generic deferred AIO completions")
> Signed-off-by: Eric Biggers <ebiggers@...gle.com>
> ---
> 
> v2: new implementation using smp_load_acquire() + smp_store_release()
>     and a mutex.
> 
>  fs/direct-io.c       | 42 ++++++++++++++++++++++++------------------
>  fs/iomap/direct-io.c |  3 +--
>  2 files changed, 25 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index 6d5370eac2a8..c03c2204aadf 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -592,20 +592,28 @@ static inline int dio_bio_reap(struct dio *dio, struct dio_submit *sdio)
>   */
>  int sb_init_dio_done_wq(struct super_block *sb)
>  {
> -	struct workqueue_struct *old;
> -	struct workqueue_struct *wq = alloc_workqueue("dio/%s",
> -						      WQ_MEM_RECLAIM, 0,
> -						      sb->s_id);
> -	if (!wq)
> -		return -ENOMEM;
> -	/*
> -	 * This has to be atomic as more DIOs can race to create the workqueue
> -	 */
> -	old = cmpxchg(&sb->s_dio_done_wq, NULL, wq);
> -	/* Someone created workqueue before us? Free ours... */
> -	if (old)
> -		destroy_workqueue(wq);
> -	return 0;
> +	static DEFINE_MUTEX(sb_init_dio_done_mutex);
> +	struct workqueue_struct *wq;
> +	int err = 0;
> +
> +	/* Pairs with the smp_store_release() below */
> +	if (smp_load_acquire(&sb->s_dio_done_wq))
> +		return 0;
> +
> +	mutex_lock(&sb_init_dio_done_mutex);
> +	if (sb->s_dio_done_wq)
> +		goto out;
> +
> +	wq = alloc_workqueue("dio/%s", WQ_MEM_RECLAIM, 0, sb->s_id);
> +	if (!wq) {
> +		err = -ENOMEM;
> +		goto out;
> +	}
> +	/* Pairs with the smp_load_acquire() above */
> +	smp_store_release(&sb->s_dio_done_wq, wq);

Why not use cmpxchg_release here?  Is the mutex actually required here,
or is this merely following the "don't complicate it up" guidelines in
the "One-Time Init" recipe that say not to use cmpxchg_release unless
you have a strong justification for it?

The code changes look ok to me, fwiw.

--D

> +out:
> +	mutex_unlock(&sb_init_dio_done_mutex);
> +	return err;
>  }
>  
>  static int dio_set_defer_completion(struct dio *dio)
> @@ -615,9 +623,7 @@ static int dio_set_defer_completion(struct dio *dio)
>  	if (dio->defer_completion)
>  		return 0;
>  	dio->defer_completion = true;
> -	if (!sb->s_dio_done_wq)
> -		return sb_init_dio_done_wq(sb);
> -	return 0;
> +	return sb_init_dio_done_wq(sb);
>  }
>  
>  /*
> @@ -1250,7 +1256,7 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
>  		retval = 0;
>  		if (iocb->ki_flags & IOCB_DSYNC)
>  			retval = dio_set_defer_completion(dio);
> -		else if (!dio->inode->i_sb->s_dio_done_wq) {
> +		else {
>  			/*
>  			 * In case of AIO write racing with buffered read we
>  			 * need to defer completion. We can't decide this now,
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index ec7b78e6feca..dc7fe898dab8 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -487,8 +487,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  		dio_warn_stale_pagecache(iocb->ki_filp);
>  	ret = 0;
>  
> -	if (iov_iter_rw(iter) == WRITE && !wait_for_completion &&
> -	    !inode->i_sb->s_dio_done_wq) {
> +	if (iov_iter_rw(iter) == WRITE && !wait_for_completion) {
>  		ret = sb_init_dio_done_wq(inode->i_sb);
>  		if (ret < 0)
>  			goto out_free_dio;
> -- 
> 2.27.0
>