[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGudoHEHzywyxVAdtB_Mg4Wh=g=QmNY=ODGmNPEWmmhc-ugwEw@mail.gmail.com>
Date: Tue, 27 Aug 2024 12:21:21 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: Jan Kara <jack@...e.cz>
Cc: brauner@...nel.org, viro@...iv.linux.org.uk, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org, Jeff Layton <jlayton@...nel.org>
Subject: Re: [PATCH] vfs: elide smp_mb in iversion handling in the common case
On Tue, Aug 27, 2024 at 12:00 PM Jan Kara <jack@...e.cz> wrote:
>
> On Thu 15-08-24 10:33:10, Mateusz Guzik wrote:
> > According to bpftrace on these routines most calls result in cmpxchg,
> > which already provides the same guarantee.
> >
> > In inode_maybe_inc_iversion elision is possible because even if the
> > wrong value was read due to now missing smp_mb fence, the issue is going
> > to correct itself after cmpxchg. If it appears cmpxchg wont be issued,
> > the fence + reload are there bringing back previous behavior.
> >
> > Signed-off-by: Mateusz Guzik <mjguzik@...il.com>
> > ---
> >
> > chances are this entire barrier guarantee is of no significance, but i'm
> > not signing up to review it
>
> Jeff might have a ready answer here - added to CC. I think the barrier is
> needed in principle so that you can guarantee that after a data change you
> will be able to observe an i_version change.
>
That is the description, I am saying it is unclear if anyone needs it
and that I am not interested in spending time on finding out.
> > I verified the force flag is not *always* set (but it is set in the most
> > common case).
>
> Well, I'm not convinced the more complicated code is really worth it.
> 'force' will be set when we update timestamps which happens once per tick
> (usually 1-4 ms). So that is common case on lightly / moderately loaded
> system. On heavily write(2)-loaded system, 'force' should be mostly false
> and unless you also heavily stat(2) the modified files, the common path is
> exactly the "if (!force && !(cur & I_VERSION_QUERIED))" branch. So saving
> one smp_mb() on moderately loaded system per couple of ms (per inode)
> doesn't seem like a noticeable win...
>
inode_maybe_inc_iversion is used a lot and most commonly *with* the force flag.
You can try it out yourself: bpftrace -e
'kprobe:inode_maybe_inc_iversion { @[kstack(), arg1] = count(); }'
and run your favourite fs-using workload, for example this is a top
backtrace form few seconds of building the kernel:
@[
inode_maybe_inc_iversion+5
inode_update_timestamps+238
generic_update_time+19
file_update_time+125
shmem_file_write_iter+118
vfs_write+599
ksys_write+103
do_syscall_64+82
entry_SYSCALL_64_after_hwframe+118
, 1]: 1670
the '1' at the end indicates 'force' flag set to 1.
This also shows up on unlink et al.
The context here is that vfs is dog slow single-threaded in part due
to spurious barriers sneaked in all over the place, here is another
example:
https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs.misc&id=30eb7cc03875b508ce6683c8b3cf6e88442a2f87
--
Mateusz Guzik <mjguzik gmail.com>
Powered by blists - more mailing lists