[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240827100045.m3mpko3tvmmjkmvm@quack3>
Date: Tue, 27 Aug 2024 12:00:45 +0200
From: Jan Kara <jack@...e.cz>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: brauner@...nel.org, viro@...iv.linux.org.uk, jack@...e.cz,
linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
Jeff Layton <jlayton@...nel.org>
Subject: Re: [PATCH] vfs: elide smp_mb in iversion handling in the common case
On Thu 15-08-24 10:33:10, Mateusz Guzik wrote:
> According to bpftrace on these routines most calls result in cmpxchg,
> which already provides the same guarantee.
>
> In inode_maybe_inc_iversion elision is possible because even if the
> wrong value was read due to now missing smp_mb fence, the issue is going
> to correct itself after cmpxchg. If it appears cmpxchg wont be issued,
> the fence + reload are there bringing back previous behavior.
>
> Signed-off-by: Mateusz Guzik <mjguzik@...il.com>
> ---
>
> chances are this entire barrier guarantee is of no significance, but i'm
> not signing up to review it
Jeff might have a ready answer here - added to CC. I think the barrier is
needed in principle so that you can guarantee that after a data change you
will be able to observe an i_version change.
> I verified the force flag is not *always* set (but it is set in the most
> common case).
Well, I'm not convinced the more complicated code is really worth it.
'force' will be set when we update timestamps which happens once per tick
(usually 1-4 ms). So that is common case on lightly / moderately loaded
system. On heavily write(2)-loaded system, 'force' should be mostly false
and unless you also heavily stat(2) the modified files, the common path is
exactly the "if (!force && !(cur & I_VERSION_QUERIED))" branch. So saving
one smp_mb() on moderately loaded system per couple of ms (per inode)
doesn't seem like a noticeable win...
Honza
> diff --git a/fs/libfs.c b/fs/libfs.c
> index 8aa34870449f..61ae4811270a 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -1990,13 +1990,19 @@ bool inode_maybe_inc_iversion(struct inode *inode, bool force)
> * information, but the legacy inode_inc_iversion code used a spinlock
> * to serialize increments.
> *
> - * Here, we add full memory barriers to ensure that any de-facto
> - * ordering with other info is preserved.
> + * We add a full memory barrier to ensure that any de facto ordering
> + * with other state is preserved (either implicitly coming from cmpxchg
> + * or explicitly from smp_mb if we don't know upfront if we will execute
> + * the former).
> *
> - * This barrier pairs with the barrier in inode_query_iversion()
> + * These barriers pair with inode_query_iversion().
> */
> - smp_mb();
> cur = inode_peek_iversion_raw(inode);
> + if (!force && !(cur & I_VERSION_QUERIED)) {
> + smp_mb();
> + cur = inode_peek_iversion_raw(inode);
> + }
> +
> do {
> /* If flag is clear then we needn't do anything */
> if (!force && !(cur & I_VERSION_QUERIED))
> @@ -2025,20 +2031,22 @@ EXPORT_SYMBOL(inode_maybe_inc_iversion);
> u64 inode_query_iversion(struct inode *inode)
> {
> u64 cur, new;
> + bool fenced = false;
>
> + /*
> + * Memory barriers (implicit in cmpxchg, explicit in smp_mb) pair with
> + * inode_maybe_inc_iversion(), see that routine for more details.
> + */
> cur = inode_peek_iversion_raw(inode);
> do {
> /* If flag is already set, then no need to swap */
> if (cur & I_VERSION_QUERIED) {
> - /*
> - * This barrier (and the implicit barrier in the
> - * cmpxchg below) pairs with the barrier in
> - * inode_maybe_inc_iversion().
> - */
> - smp_mb();
> + if (!fenced)
> + smp_mb();
> break;
> }
>
> + fenced = true;
> new = cur | I_VERSION_QUERIED;
> } while (!atomic64_try_cmpxchg(&inode->i_version, &cur, new));
> return cur >> I_VERSION_QUERIED_SHIFT;
> --
> 2.43.0
>
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists