lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240913112602.xrfdn7hinz32bhso@quack3>
Date: Fri, 13 Sep 2024 13:26:02 +0200
From: Jan Kara <jack@...e.cz>
To: Jeff Layton <jlayton@...nel.org>
Cc: Alexander Viro <viro@...iv.linux.org.uk>,
	Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
	John Stultz <jstultz@...gle.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Stephen Boyd <sboyd@...nel.org>, Arnd Bergmann <arnd@...nel.org>,
	Vadim Fedorenko <vadim.fedorenko@...ux.dev>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	kernel test robot <oliver.sang@...el.com>
Subject: Re: [PATCH v2] timekeeping: move multigrain timestamp floor handling
 into timekeeper

On Thu 12-09-24 14:02:52, Jeff Layton wrote:
> The kernel test robot reported a performance hit in some will-it-scale
> tests due to the multigrain timestamp patches.  My own testing showed
> about a 7% drop in performance on the pipe1_threads test, and the data
> showed that coarse_ctime() was slowing down current_time().
> 
> Move the multigrain timestamp floor tracking word into timekeeper.c. Add
> two new public interfaces: The first fills a timespec64 with the later
> of the coarse-grained clock and the floor time, and the second gets a
> fine-grained time and tries to swap it into the floor and fills a
> timespec64 with the result.
> 
> The first function returns an opaque cookie that is suitable for passing
> to the second, which will use it as the "old" value in the cmpxchg.
> 
> With this patch on top of the multigrain series, the will-it-scale
> pipe1_threads microbenchmark shows these averages on my test rig:
> 
> 	v6.11-rc7:			103561295 (baseline)
> 	v6.11-rc7 + mgtime + this:	101357203 (~2% performance drop)
> 
> Reported-by: kernel test robot <oliver.sang@...el.com>
> Closes: https://lore.kernel.org/oe-lkp/202409091303.31b2b713-oliver.sang@intel.com
> Suggested-by: Arnd Bergmann <arnd@...nel.org>
> Signed-off-by: Jeff Layton <jlayton@...nel.org>

One question regarding the cookie handling as well :)

> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 5391e4167d60..bb039c9d525e 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -114,6 +114,13 @@ static struct tk_fast tk_fast_raw  ____cacheline_aligned = {
>  	.base[1] = FAST_TK_INIT,
>  };
>  
> +/*
> + * This represents the latest fine-grained time that we have handed out as a
> + * timestamp on the system. Tracked as a monotonic ktime_t, and converted to the
> + * realtime clock on an as-needed basis.
> + */
> +static __cacheline_aligned_in_smp atomic64_t mg_floor;
> +
>  static inline void tk_normalize_xtime(struct timekeeper *tk)
>  {
>  	while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_mono.shift)) {
> @@ -2394,6 +2401,76 @@ void ktime_get_coarse_real_ts64(struct timespec64 *ts)
>  }
>  EXPORT_SYMBOL(ktime_get_coarse_real_ts64);
>  
> +/**
> + * ktime_get_coarse_real_ts64_mg - get later of coarse grained time or floor
> + * @ts: timespec64 to be filled
> + *
> + * Adjust floor to realtime and compare it to the coarse time. Fill
> + * @ts with the latest one. Returns opaque cookie suitable to pass
> + * to ktime_get_real_ts64_mg.
> + */
> +u64 ktime_get_coarse_real_ts64_mg(struct timespec64 *ts)
> +{
> +	struct timekeeper *tk = &tk_core.timekeeper;
> +	u64 floor = atomic64_read(&mg_floor);
> +	ktime_t f_real, offset, coarse;
> +	unsigned int seq;
> +
> +	WARN_ON(timekeeping_suspended);
> +
> +	do {
> +		seq = read_seqcount_begin(&tk_core.seq);
> +		*ts = tk_xtime(tk);
> +		offset = *offsets[TK_OFFS_REAL];
> +	} while (read_seqcount_retry(&tk_core.seq, seq));
> +
> +	coarse = timespec64_to_ktime(*ts);
> +	f_real = ktime_add(floor, offset);
> +	if (ktime_after(f_real, coarse))
> +		*ts = ktime_to_timespec64(f_real);
> +	return floor;
> +}
> +EXPORT_SYMBOL_GPL(ktime_get_coarse_real_ts64_mg);
> +
> +/**
> + * ktime_get_real_ts64_mg - attempt to update floor value and return result
> + * @ts:		pointer to the timespec to be set
> + * @cookie:	opaque cookie from earlier call to ktime_get_coarse_real_ts64_mg()
> + *
> + * Get a current monotonic fine-grained time value and attempt to swap
> + * it into the floor using @cookie as the "old" value. @ts will be
> + * filled with the resulting floor value, regardless of the outcome of
> + * the swap.
> + */
> +void ktime_get_real_ts64_mg(struct timespec64 *ts, u64 cookie)
> +{
> +	struct timekeeper *tk = &tk_core.timekeeper;
> +	ktime_t offset, mono, old = (ktime_t)cookie;
> +	unsigned int seq;
> +	u64 nsecs;

So what would be the difference if we did instead:

	old = atomic64_read(&mg_floor);

and not bother with the cookie? AFAIU this could result in somewhat more
updates to mg_floor (the contention on the mg_floor cacheline would be the
same but there would be more invalidates of the cacheline). OTOH these
updates can happen only if max(current_coarse_time, mg_floor) ==
inode->i_ctime which is presumably rare? What is your concern that I'm
missing?

								Honza	
> +
> +	WARN_ON(timekeeping_suspended);
> +
> +	do {
> +		seq = read_seqcount_begin(&tk_core.seq);
> +
> +		ts->tv_sec = tk->xtime_sec;
> +		mono = tk->tkr_mono.base;
> +		nsecs = timekeeping_get_ns(&tk->tkr_mono);
> +		offset = *offsets[TK_OFFS_REAL];
> +	} while (read_seqcount_retry(&tk_core.seq, seq));
> +
> +	mono = ktime_add_ns(mono, nsecs);
> +	if (atomic64_try_cmpxchg(&mg_floor, &old, mono)) {
> +		ts->tv_nsec = 0;
> +		timespec64_add_ns(ts, nsecs);
> +	} else {
> +		*ts = ktime_to_timespec64(ktime_add(old, offset));
> +	}
> +
> +}
> +EXPORT_SYMBOL(ktime_get_real_ts64_mg);
> +
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ