[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20231018-mgtime-v1-0-4a7a97b1f482@kernel.org>
Date: Wed, 18 Oct 2023 13:41:07 -0400
From: Jeff Layton <jlayton@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>,
John Stultz <jstultz@...gle.com>,
Thomas Gleixner <tglx@...utronix.de>,
Stephen Boyd <sboyd@...nel.org>,
Chandan Babu R <chandan.babu@...cle.com>,
"Darrick J. Wong" <djwong@...nel.org>,
Dave Chinner <david@...morbit.com>,
Theodore Ts'o <tytso@....edu>,
Andreas Dilger <adilger.kernel@...ger.ca>,
Chris Mason <clm@...com>, Josef Bacik <josef@...icpanda.com>,
David Sterba <dsterba@...e.com>,
Hugh Dickins <hughd@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Amir Goldstein <amir73il@...il.com>, Jan Kara <jack@...e.de>,
David Howells <dhowells@...hat.com>
Cc: linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-xfs@...r.kernel.org, linux-ext4@...r.kernel.org,
linux-btrfs@...r.kernel.org, linux-mm@...ck.org,
linux-nfs@...r.kernel.org, Jeff Layton <jlayton@...nel.org>
Subject: [PATCH RFC 0/9] fs: multigrain timestamps (redux)
The VFS always uses coarse-grained timestamps when updating the
ctime and mtime after a change. This has the benefit of allowing
filesystems to optimize away a lot metadata updates, down to around 1
per jiffy, even when a file is under heavy writes.
Unfortunately, this coarseness has always been an issue when we're
exporting via NFSv3, which relies on timestamps to validate caches. A
lot of changes can happen in a jiffy, so timestamps aren't sufficient to
help the client decide to invalidate the cache.
Even with NFSv4, a lot of exported filesystems don't properly support a
change attribute and are subject to the same problems with timestamp
granularity. Other applications have similar issues with timestamps (e.g
backup applications).
If we were to always use fine-grained timestamps, that would improve the
situation, but that becomes rather expensive, as the underlying
filesystem would have to log a lot more metadata updates.
What we need is a way to only use fine-grained timestamps when they are
being actively queried. The idea is to use an unused bit in the ctime's
tv_nsec field to mark when the mtime or ctime has been queried via
getattr. Once that has been marked, the next m/ctime update will use a
fine-grained timestamp.
The original merge of multigrain timestamps for v6.6 had to be reverted,
as a file with a coarse-grained timestamp could incorrectly appear to be
modified before a file with a fine-grained timestamp, when that wasn't
the case.
This revision solves that problem by making it so that when a
fine-grained timespec64 is handed out, that that value becomes the floor
for further coarse-grained timespec64 fetches. This requires new
timekeeper interfaces with a potential downside: when a file is
stamped with a fine-grained timestamp, it has to (briefly) take the
global timekeeper spinlock.
Because of that, this set takes greater pains to avoid issuing new
fine-grained timestamps when possible. A fine-grained timestamp is now
only required if the current mtime or ctime have been fetched for a
getattr, and the next coarse-grained tick has not happened yet. For any
other case, a coarse-grained timestamp is fine, and that is done using
the seqcount.
In order to get some hard numbers about how often the lock would be
taken, I've added a couple of percpu counters and a debugfs file for
tracking both types of multigrain timekeeper fetches.
With this, I did a kdevops fstests run on xfs (CRC mode). I ran "make
fstests-baseline" and then immediately grabbed the counter values, and
calcuated the percentage:
$ time make fstests-baseline
real 324m17.337s
user 27m23.213s
sys 2m40.313s
fine 3059498
coarse 383848171
pct fine .79075661
Next I did a kdevops fstests run with NFS. One server serving 3 clients
(v4.2, v4.0 and v3). Again, timed "make fstests-baseline" and then
grabbed the multigrain counters from the NFS server:
$ time make fstests-baseline
real 181m57.585s
user 16m8.266s
sys 1m45.864s
fine 8137657
coarse 44726007
pct fine 15.393668
We can't run as many tests on nfs as xfs, so the run is shorter. nfsd is
a very getattr-heavy workload, and the clients aggressively coalesce
writes, so this is probably something of a pessimal case for number of
fine-grained timestamps over time.
At this point I'm mainly wondering whether (briefly) taking the
timekeeper spinlock in this codepath is unreasonable. It does very
little work under it, so I'm hoping the impact would be unmeasurable for
most workloads.
Side Q: what's the best tool for measuring spinlock contention? It'd be
interesting to see how often (and how long) we end up spinning on this
lock under different workloads.
Note that some of the patches in the series are virtually identical to
the ones before. I stripped the prior Reviewed-by/Acked-by tags though
since the underlying infrastructure has changed a bit.
Comments and suggestions welcome.
Signed-off-by: Jeff Layton <jlayton@...nel.org>
---
Jeff Layton (9):
fs: switch timespec64 fields in inode to discrete integers
timekeeping: new interfaces for multigrain timestamp handing
timekeeping: add new debugfs file to count multigrain timestamps
fs: add infrastructure for multigrain timestamps
fs: have setattr_copy handle multigrain timestamps appropriately
xfs: switch to multigrain timestamps
ext4: switch to multigrain timestamps
btrfs: convert to multigrain timestamps
tmpfs: add support for multigrain timestamps
fs/attr.c | 52 ++++++++++++++--
fs/btrfs/file.c | 25 ++------
fs/btrfs/super.c | 5 +-
fs/ext4/super.c | 2 +-
fs/inode.c | 70 ++++++++++++++++++++-
fs/stat.c | 41 ++++++++++++-
fs/xfs/libxfs/xfs_trans_inode.c | 6 +-
fs/xfs/xfs_iops.c | 10 +--
fs/xfs/xfs_super.c | 2 +-
include/linux/fs.h | 85 ++++++++++++++++++--------
include/linux/timekeeper_internal.h | 2 +
include/linux/timekeeping.h | 4 ++
kernel/time/timekeeping.c | 117 ++++++++++++++++++++++++++++++++++++
mm/shmem.c | 2 +-
14 files changed, 352 insertions(+), 71 deletions(-)
---
base-commit: 12cd44023651666bd44baa36a5c999698890debb
change-id: 20231016-mgtime-fe3ea75c6f59
Best regards,
--
Jeff Layton <jlayton@...nel.org>
Powered by blists - more mailing lists