lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a5daa75c9bb4c0104c4c09ab04491f51729ade0e.camel@kernel.org>
Date: Tue, 27 Aug 2024 06:50:07 -0400
From: Jeff Layton <jlayton@...nel.org>
To: Mateusz Guzik <mjguzik@...il.com>, brauner@...nel.org
Cc: viro@...iv.linux.org.uk, jack@...e.cz, linux-kernel@...r.kernel.org, 
	linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH] vfs: elide smp_mb in iversion handling in the common
 case

On Thu, 2024-08-15 at 10:33 +0200, Mateusz Guzik wrote:
> According to bpftrace on these routines most calls result in cmpxchg,
> which already provides the same guarantee.
> 
> In inode_maybe_inc_iversion elision is possible because even if the
> wrong value was read due to now missing smp_mb fence, the issue is going
> to correct itself after cmpxchg. If it appears cmpxchg wont be issued,
> the fence + reload are there bringing back previous behavior.
> 
> Signed-off-by: Mateusz Guzik <mjguzik@...il.com>
> ---
> 
> chances are this entire barrier guarantee is of no significance, but i'm
> not signing up to review it
> 
> I verified the force flag is not *always* set (but it is set in the most common case).
> 
>  fs/libfs.c | 28 ++++++++++++++++++----------
>  1 file changed, 18 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/libfs.c b/fs/libfs.c
> index 8aa34870449f..61ae4811270a 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -1990,13 +1990,19 @@ bool inode_maybe_inc_iversion(struct inode *inode, bool force)
>  	 * information, but the legacy inode_inc_iversion code used a spinlock
>  	 * to serialize increments.
>  	 *
> -	 * Here, we add full memory barriers to ensure that any de-facto
> -	 * ordering with other info is preserved.
> +	 * We add a full memory barrier to ensure that any de facto ordering
> +	 * with other state is preserved (either implicitly coming from cmpxchg
> +	 * or explicitly from smp_mb if we don't know upfront if we will execute
> +	 * the former).
>  	 *
> -	 * This barrier pairs with the barrier in inode_query_iversion()
> +	 * These barriers pair with inode_query_iversion().
>  	 */
> -	smp_mb();
>  	cur = inode_peek_iversion_raw(inode);
> +	if (!force && !(cur & I_VERSION_QUERIED)) {
> +		smp_mb();
> +		cur = inode_peek_iversion_raw(inode);
> +	}
> +
>  	do {
>  		/* If flag is clear then we needn't do anything */
>  		if (!force && !(cur & I_VERSION_QUERIED))
> @@ -2025,20 +2031,22 @@ EXPORT_SYMBOL(inode_maybe_inc_iversion);
>  u64 inode_query_iversion(struct inode *inode)
>  {
>  	u64 cur, new;
> +	bool fenced = false;
>  
> +	/*
> +	 * Memory barriers (implicit in cmpxchg, explicit in smp_mb) pair with
> +	 * inode_maybe_inc_iversion(), see that routine for more details.
> +	 */
>  	cur = inode_peek_iversion_raw(inode);
>  	do {
>  		/* If flag is already set, then no need to swap */
>  		if (cur & I_VERSION_QUERIED) {
> -			/*
> -			 * This barrier (and the implicit barrier in the
> -			 * cmpxchg below) pairs with the barrier in
> -			 * inode_maybe_inc_iversion().
> -			 */
> -			smp_mb();
> +			if (!fenced)
> +				smp_mb();
>  			break;
>  		}
>  
> +		fenced = true;
>  		new = cur | I_VERSION_QUERIED;
>  	} while (!atomic64_try_cmpxchg(&inode->i_version, &cur, new));
>  	return cur >> I_VERSION_QUERIED_SHIFT;

I'll look over the patch in more detail later, but...

Primarily, we set "force" when we know we're going to be doing a
journal transaction anyway. The idea there was that since we're doing
that (usually to update timestamps or something), then we might as well
bump the iversion too.

For instance, XFS does this:

    inode_maybe_inc_iversion(inode, flags & XFS_ILOG_CORE)

...and XFS_ILOG_CORE is set when there is already a transaction queued.

That isn't actually required. If someone hasn't queried this value then
we're not required to bump the iversion at all. We could just elide
some of those "force" cases as well.
-- 
Jeff Layton <jlayton@...nel.org>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ