[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240912-mgtime-v2-1-54db84afb7a7@kernel.org>
Date: Thu, 12 Sep 2024 14:02:52 -0400
From: Jeff Layton <jlayton@...nel.org>
To: Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
John Stultz <jstultz@...gle.com>, Thomas Gleixner <tglx@...utronix.de>,
Stephen Boyd <sboyd@...nel.org>, Arnd Bergmann <arnd@...nel.org>
Cc: Vadim Fedorenko <vadim.fedorenko@...ux.dev>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
kernel test robot <oliver.sang@...el.com>, Jeff Layton <jlayton@...nel.org>
Subject: [PATCH v2] timekeeping: move multigrain timestamp floor handling
into timekeeper
The kernel test robot reported a performance hit in some will-it-scale
tests due to the multigrain timestamp patches. My own testing showed
about a 7% drop in performance on the pipe1_threads test, and the data
showed that coarse_ctime() was slowing down current_time().
Move the multigrain timestamp floor tracking word into timekeeper.c. Add
two new public interfaces: The first fills a timespec64 with the later
of the coarse-grained clock and the floor time, and the second gets a
fine-grained time and tries to swap it into the floor and fills a
timespec64 with the result.
The first function returns an opaque cookie that is suitable for passing
to the second, which will use it as the "old" value in the cmpxchg.
With this patch on top of the multigrain series, the will-it-scale
pipe1_threads microbenchmark shows these averages on my test rig:
v6.11-rc7: 103561295 (baseline)
v6.11-rc7 + mgtime + this: 101357203 (~2% performance drop)
Reported-by: kernel test robot <oliver.sang@...el.com>
Closes: https://lore.kernel.org/oe-lkp/202409091303.31b2b713-oliver.sang@intel.com
Suggested-by: Arnd Bergmann <arnd@...nel.org>
Signed-off-by: Jeff Layton <jlayton@...nel.org>
---
This version moves more of the floor handling into the timekeeper code.
Some of the earlier concern was about mixing different time value types
in the same interface. This sort of still does that, but it does it
using an opaque cookie value instead, which I'm hoping will make the
interfaces a little cleaner. Using an opaque cookie also means that we
can change the underlying implementation at will, without breaking the
callers.
If this approach looks OK, it's probably best for me to just respin the
entire series and have Christian drop the old and pick up the new. That
should reduce some of the unnecessary churn in fs/inode.c.
---
Changes in v2:
- move floor handling completely into timekeeper code
- add new interfaces for multigrain timestamp handling
- Link to v1: https://lore.kernel.org/r/20240911-mgtime-v1-1-e4aedf1d0d15@kernel.org
---
fs/inode.c | 105 +++++++++++++-------------------------------
include/linux/timekeeping.h | 4 ++
kernel/time/timekeeping.c | 77 ++++++++++++++++++++++++++++++++
3 files changed, 111 insertions(+), 75 deletions(-)
diff --git a/fs/inode.c b/fs/inode.c
index f0fbfd470d8e..3c7e16935bac 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -65,13 +65,6 @@ static unsigned int i_hash_shift __ro_after_init;
static struct hlist_head *inode_hashtable __ro_after_init;
static __cacheline_aligned_in_smp DEFINE_SPINLOCK(inode_hash_lock);
-/*
- * This represents the latest fine-grained time that we have handed out as a
- * timestamp on the system. Tracked as a monotonic value, and converted to the
- * realtime clock on an as-needed basis.
- */
-static __cacheline_aligned_in_smp atomic64_t ctime_floor;
-
/*
* Empty aops. Can be used for the cases where the user does not
* define any of the address_space operations.
@@ -2255,25 +2248,6 @@ int file_remove_privs(struct file *file)
}
EXPORT_SYMBOL(file_remove_privs);
-/**
- * coarse_ctime - return the current coarse-grained time
- * @floor: current (monotonic) ctime_floor value
- *
- * Get the coarse-grained time, and then determine whether to
- * return it or the current floor value. Returns the later of the
- * floor and coarse grained timestamps, converted to realtime
- * clock value.
- */
-static ktime_t coarse_ctime(ktime_t floor)
-{
- ktime_t coarse = ktime_get_coarse();
-
- /* If coarse time is already newer, return that */
- if (!ktime_after(floor, coarse))
- return ktime_get_coarse_real();
- return ktime_mono_to_real(floor);
-}
-
/**
* current_time - Return FS time (possibly fine-grained)
* @inode: inode.
@@ -2284,11 +2258,11 @@ static ktime_t coarse_ctime(ktime_t floor)
*/
struct timespec64 current_time(struct inode *inode)
{
- ktime_t floor = atomic64_read(&ctime_floor);
- ktime_t now = coarse_ctime(floor);
- struct timespec64 now_ts = ktime_to_timespec64(now);
+ struct timespec64 now;
u32 cns;
+ ktime_get_coarse_real_ts64_mg(&now);
+
if (!is_mgtime(inode))
goto out;
@@ -2299,11 +2273,11 @@ struct timespec64 current_time(struct inode *inode)
* If there is no apparent change, then
* get a fine-grained timestamp.
*/
- if (now_ts.tv_nsec == (cns & ~I_CTIME_QUERIED))
- ktime_get_real_ts64(&now_ts);
+ if (now.tv_nsec == (cns & ~I_CTIME_QUERIED))
+ ktime_get_real_ts64(&now);
}
out:
- return timestamp_truncate(now_ts, inode);
+ return timestamp_truncate(now, inode);
}
EXPORT_SYMBOL(current_time);
@@ -2745,7 +2719,7 @@ EXPORT_SYMBOL(timestamp_truncate);
*
* Set the inode's ctime to the current value for the inode. Returns the
* current value that was assigned. If this is not a multigrain inode, then we
- * just set it to whatever the coarse_ctime is.
+ * set it to the later of the coarse time and floor value.
*
* If it is multigrain, then we first see if the coarse-grained timestamp is
* distinct from what we have. If so, then we'll just use that. If we have to
@@ -2756,16 +2730,16 @@ EXPORT_SYMBOL(timestamp_truncate);
*/
struct timespec64 inode_set_ctime_current(struct inode *inode)
{
- ktime_t now, floor = atomic64_read(&ctime_floor);
- struct timespec64 now_ts;
+ struct timespec64 now;
u32 cns, cur;
+ u64 cookie;
- now = coarse_ctime(floor);
+ cookie = ktime_get_coarse_real_ts64_mg(&now);
/* Just return that if this is not a multigrain fs */
if (!is_mgtime(inode)) {
- now_ts = timestamp_truncate(ktime_to_timespec64(now), inode);
- inode_set_ctime_to_ts(inode, now_ts);
+ now = timestamp_truncate(now, inode);
+ inode_set_ctime_to_ts(inode, now);
goto out;
}
@@ -2776,44 +2750,27 @@ struct timespec64 inode_set_ctime_current(struct inode *inode)
*/
cns = smp_load_acquire(&inode->i_ctime_nsec);
if (cns & I_CTIME_QUERIED) {
- ktime_t ctime = ktime_set(inode->i_ctime_sec, cns & ~I_CTIME_QUERIED);
-
- if (!ktime_after(now, ctime)) {
- ktime_t old, fine;
+ struct timespec64 ctime = { .tv_sec = inode->i_ctime_sec,
+ .tv_nsec = cns & ~I_CTIME_QUERIED };
- /* Get a fine-grained time */
- fine = ktime_get();
- mgtime_counter_inc(mg_fine_stamps);
-
- /*
- * If the cmpxchg works, we take the new floor value. If
- * not, then that means that someone else changed it after we
- * fetched it but before we got here. That value is just
- * as good, so keep it.
- */
- old = floor;
- if (atomic64_try_cmpxchg(&ctime_floor, &old, fine))
- mgtime_counter_inc(mg_floor_swaps);
- else
- fine = old;
- now = ktime_mono_to_real(fine);
- }
+ if (timespec64_compare(&now, &ctime) <= 0)
+ ktime_get_real_ts64_mg(&now, cookie);
}
mgtime_counter_inc(mg_ctime_updates);
- now_ts = timestamp_truncate(ktime_to_timespec64(now), inode);
- cur = cns;
+ now = timestamp_truncate(now, inode);
/* No need to cmpxchg if it's exactly the same */
- if (cns == now_ts.tv_nsec && inode->i_ctime_sec == now_ts.tv_sec) {
- trace_ctime_xchg_skip(inode, &now_ts);
+ if (cns == now.tv_nsec && inode->i_ctime_sec == now.tv_sec) {
+ trace_ctime_xchg_skip(inode, &now);
goto out;
}
+ cur = cns;
retry:
/* Try to swap the nsec value into place. */
- if (try_cmpxchg(&inode->i_ctime_nsec, &cur, now_ts.tv_nsec)) {
+ if (try_cmpxchg(&inode->i_ctime_nsec, &cur, now.tv_nsec)) {
/* If swap occurred, then we're (mostly) done */
- inode->i_ctime_sec = now_ts.tv_sec;
- trace_ctime_ns_xchg(inode, cns, now_ts.tv_nsec, cur);
+ inode->i_ctime_sec = now.tv_sec;
+ trace_ctime_ns_xchg(inode, cns, now.tv_nsec, cur);
mgtime_counter_inc(mg_ctime_swaps);
} else {
/*
@@ -2827,11 +2784,11 @@ struct timespec64 inode_set_ctime_current(struct inode *inode)
goto retry;
}
/* Otherwise, keep the existing ctime */
- now_ts.tv_sec = inode->i_ctime_sec;
- now_ts.tv_nsec = cur & ~I_CTIME_QUERIED;
+ now.tv_sec = inode->i_ctime_sec;
+ now.tv_nsec = cur & ~I_CTIME_QUERIED;
}
out:
- return now_ts;
+ return now;
}
EXPORT_SYMBOL(inode_set_ctime_current);
@@ -2854,8 +2811,7 @@ EXPORT_SYMBOL(inode_set_ctime_current);
*/
struct timespec64 inode_set_ctime_deleg(struct inode *inode, struct timespec64 update)
{
- ktime_t now, floor = atomic64_read(&ctime_floor);
- struct timespec64 now_ts, cur_ts;
+ struct timespec64 now, cur_ts;
u32 cur, old;
/* pairs with try_cmpxchg below */
@@ -2867,12 +2823,11 @@ struct timespec64 inode_set_ctime_deleg(struct inode *inode, struct timespec64 u
if (timespec64_compare(&update, &cur_ts) <= 0)
return cur_ts;
- now = coarse_ctime(floor);
- now_ts = ktime_to_timespec64(now);
+ ktime_get_coarse_real_ts64_mg(&now);
/* Clamp the update to "now" if it's in the future */
- if (timespec64_compare(&update, &now_ts) > 0)
- update = now_ts;
+ if (timespec64_compare(&update, &now) > 0)
+ update = now;
update = timestamp_truncate(update, inode);
diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index fc12a9ba2c88..cf2293158c65 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -45,6 +45,10 @@ extern void ktime_get_real_ts64(struct timespec64 *tv);
extern void ktime_get_coarse_ts64(struct timespec64 *ts);
extern void ktime_get_coarse_real_ts64(struct timespec64 *ts);
+/* Multigrain timestamp interfaces */
+extern u64 ktime_get_coarse_real_ts64_mg(struct timespec64 *ts);
+extern void ktime_get_real_ts64_mg(struct timespec64 *ts, u64 cookie);
+
void getboottime64(struct timespec64 *ts);
/*
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 5391e4167d60..bb039c9d525e 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -114,6 +114,13 @@ static struct tk_fast tk_fast_raw ____cacheline_aligned = {
.base[1] = FAST_TK_INIT,
};
+/*
+ * This represents the latest fine-grained time that we have handed out as a
+ * timestamp on the system. Tracked as a monotonic ktime_t, and converted to the
+ * realtime clock on an as-needed basis.
+ */
+static __cacheline_aligned_in_smp atomic64_t mg_floor;
+
static inline void tk_normalize_xtime(struct timekeeper *tk)
{
while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_mono.shift)) {
@@ -2394,6 +2401,76 @@ void ktime_get_coarse_real_ts64(struct timespec64 *ts)
}
EXPORT_SYMBOL(ktime_get_coarse_real_ts64);
+/**
+ * ktime_get_coarse_real_ts64_mg - get later of coarse grained time or floor
+ * @ts: timespec64 to be filled
+ *
+ * Adjust floor to realtime and compare it to the coarse time. Fill
+ * @ts with the latest one. Returns opaque cookie suitable to pass
+ * to ktime_get_real_ts64_mg.
+ */
+u64 ktime_get_coarse_real_ts64_mg(struct timespec64 *ts)
+{
+ struct timekeeper *tk = &tk_core.timekeeper;
+ u64 floor = atomic64_read(&mg_floor);
+ ktime_t f_real, offset, coarse;
+ unsigned int seq;
+
+ WARN_ON(timekeeping_suspended);
+
+ do {
+ seq = read_seqcount_begin(&tk_core.seq);
+ *ts = tk_xtime(tk);
+ offset = *offsets[TK_OFFS_REAL];
+ } while (read_seqcount_retry(&tk_core.seq, seq));
+
+ coarse = timespec64_to_ktime(*ts);
+ f_real = ktime_add(floor, offset);
+ if (ktime_after(f_real, coarse))
+ *ts = ktime_to_timespec64(f_real);
+ return floor;
+}
+EXPORT_SYMBOL_GPL(ktime_get_coarse_real_ts64_mg);
+
+/**
+ * ktime_get_real_ts64_mg - attempt to update floor value and return result
+ * @ts: pointer to the timespec to be set
+ * @cookie: opaque cookie from earlier call to ktime_get_coarse_real_ts64_mg()
+ *
+ * Get a current monotonic fine-grained time value and attempt to swap
+ * it into the floor using @cookie as the "old" value. @ts will be
+ * filled with the resulting floor value, regardless of the outcome of
+ * the swap.
+ */
+void ktime_get_real_ts64_mg(struct timespec64 *ts, u64 cookie)
+{
+ struct timekeeper *tk = &tk_core.timekeeper;
+ ktime_t offset, mono, old = (ktime_t)cookie;
+ unsigned int seq;
+ u64 nsecs;
+
+ WARN_ON(timekeeping_suspended);
+
+ do {
+ seq = read_seqcount_begin(&tk_core.seq);
+
+ ts->tv_sec = tk->xtime_sec;
+ mono = tk->tkr_mono.base;
+ nsecs = timekeeping_get_ns(&tk->tkr_mono);
+ offset = *offsets[TK_OFFS_REAL];
+ } while (read_seqcount_retry(&tk_core.seq, seq));
+
+ mono = ktime_add_ns(mono, nsecs);
+ if (atomic64_try_cmpxchg(&mg_floor, &old, mono)) {
+ ts->tv_nsec = 0;
+ timespec64_add_ns(ts, nsecs);
+ } else {
+ *ts = ktime_to_timespec64(ktime_add(old, offset));
+ }
+
+}
+EXPORT_SYMBOL(ktime_get_real_ts64_mg);
+
void ktime_get_coarse_ts64(struct timespec64 *ts)
{
struct timekeeper *tk = &tk_core.timekeeper;
---
base-commit: 18348a38102a4efca57186740afb33f08e5f4609
change-id: 20240912-mgtime-c1280600a9a3
Best regards,
--
Jeff Layton <jlayton@...nel.org>
Powered by blists - more mailing lists