lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bfc8fc016aa16a757f264010fdb8e525513379ce.camel@kernel.org>
Date: Fri, 13 Sep 2024 08:01:28 -0400
From: Jeff Layton <jlayton@...nel.org>
To: Jan Kara <jack@...e.cz>
Cc: Alexander Viro <viro@...iv.linux.org.uk>, Christian Brauner
 <brauner@...nel.org>, John Stultz <jstultz@...gle.com>, Thomas Gleixner
 <tglx@...utronix.de>, Stephen Boyd <sboyd@...nel.org>, Arnd Bergmann
 <arnd@...nel.org>, Vadim Fedorenko <vadim.fedorenko@...ux.dev>, 
 linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, kernel test
 robot <oliver.sang@...el.com>
Subject: Re: [PATCH v2] timekeeping: move multigrain timestamp floor
 handling into timekeeper

On Fri, 2024-09-13 at 13:26 +0200, Jan Kara wrote:
> On Thu 12-09-24 14:02:52, Jeff Layton wrote:
> > The kernel test robot reported a performance hit in some will-it-scale
> > tests due to the multigrain timestamp patches.  My own testing showed
> > about a 7% drop in performance on the pipe1_threads test, and the data
> > showed that coarse_ctime() was slowing down current_time().
> > 
> > Move the multigrain timestamp floor tracking word into timekeeper.c. Add
> > two new public interfaces: The first fills a timespec64 with the later
> > of the coarse-grained clock and the floor time, and the second gets a
> > fine-grained time and tries to swap it into the floor and fills a
> > timespec64 with the result.
> > 
> > The first function returns an opaque cookie that is suitable for passing
> > to the second, which will use it as the "old" value in the cmpxchg.
> > 
> > With this patch on top of the multigrain series, the will-it-scale
> > pipe1_threads microbenchmark shows these averages on my test rig:
> > 
> > 	v6.11-rc7:			103561295 (baseline)
> > 	v6.11-rc7 + mgtime + this:	101357203 (~2% performance drop)
> > 
> > Reported-by: kernel test robot <oliver.sang@...el.com>
> > Closes: https://lore.kernel.org/oe-lkp/202409091303.31b2b713-oliver.sang@intel.com
> > Suggested-by: Arnd Bergmann <arnd@...nel.org>
> > Signed-off-by: Jeff Layton <jlayton@...nel.org>
> 
> One question regarding the cookie handling as well :)
> 
> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> > index 5391e4167d60..bb039c9d525e 100644
> > --- a/kernel/time/timekeeping.c
> > +++ b/kernel/time/timekeeping.c
> > @@ -114,6 +114,13 @@ static struct tk_fast tk_fast_raw  ____cacheline_aligned = {
> >  	.base[1] = FAST_TK_INIT,
> >  };
> >  
> > +/*
> > + * This represents the latest fine-grained time that we have handed out as a
> > + * timestamp on the system. Tracked as a monotonic ktime_t, and converted to the
> > + * realtime clock on an as-needed basis.
> > + */
> > +static __cacheline_aligned_in_smp atomic64_t mg_floor;
> > +
> >  static inline void tk_normalize_xtime(struct timekeeper *tk)
> >  {
> >  	while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_mono.shift)) {
> > @@ -2394,6 +2401,76 @@ void ktime_get_coarse_real_ts64(struct timespec64 *ts)
> >  }
> >  EXPORT_SYMBOL(ktime_get_coarse_real_ts64);
> >  
> > +/**
> > + * ktime_get_coarse_real_ts64_mg - get later of coarse grained time or floor
> > + * @ts: timespec64 to be filled
> > + *
> > + * Adjust floor to realtime and compare it to the coarse time. Fill
> > + * @ts with the latest one. Returns opaque cookie suitable to pass
> > + * to ktime_get_real_ts64_mg.
> > + */
> > +u64 ktime_get_coarse_real_ts64_mg(struct timespec64 *ts)
> > +{
> > +	struct timekeeper *tk = &tk_core.timekeeper;
> > +	u64 floor = atomic64_read(&mg_floor);
> > +	ktime_t f_real, offset, coarse;
> > +	unsigned int seq;
> > +
> > +	WARN_ON(timekeeping_suspended);
> > +
> > +	do {
> > +		seq = read_seqcount_begin(&tk_core.seq);
> > +		*ts = tk_xtime(tk);
> > +		offset = *offsets[TK_OFFS_REAL];
> > +	} while (read_seqcount_retry(&tk_core.seq, seq));
> > +
> > +	coarse = timespec64_to_ktime(*ts);
> > +	f_real = ktime_add(floor, offset);
> > +	if (ktime_after(f_real, coarse))
> > +		*ts = ktime_to_timespec64(f_real);
> > +	return floor;
> > +}
> > +EXPORT_SYMBOL_GPL(ktime_get_coarse_real_ts64_mg);
> > +
> > +/**
> > + * ktime_get_real_ts64_mg - attempt to update floor value and return result
> > + * @ts:		pointer to the timespec to be set
> > + * @cookie:	opaque cookie from earlier call to ktime_get_coarse_real_ts64_mg()
> > + *
> > + * Get a current monotonic fine-grained time value and attempt to swap
> > + * it into the floor using @cookie as the "old" value. @ts will be
> > + * filled with the resulting floor value, regardless of the outcome of
> > + * the swap.
> > + */
> > +void ktime_get_real_ts64_mg(struct timespec64 *ts, u64 cookie)
> > +{
> > +	struct timekeeper *tk = &tk_core.timekeeper;
> > +	ktime_t offset, mono, old = (ktime_t)cookie;
> > +	unsigned int seq;
> > +	u64 nsecs;
> 
> So what would be the difference if we did instead:
> 
> 	old = atomic64_read(&mg_floor);
> 
> and not bother with the cookie? AFAIU this could result in somewhat more
> updates to mg_floor (the contention on the mg_floor cacheline would be the
> same but there would be more invalidates of the cacheline). OTOH these
> updates can happen only if max(current_coarse_time, mg_floor) ==
> inode->i_ctime which is presumably rare? What is your concern that I'm
> missing?
> 

My main concern is the "somewhat more updates to mg_floor". mg_floor is
a global variable, so one of my main goals is to minimize the updates
to it. There is no correctness issue in doing what you're saying above
(AFAICT anyway), but the window of time between when we fetch the
current floor and try to do the swap will be smaller, and we'll end up
doing more swaps as a result.

Do you have any objection to adding the cookie to this API?

> 
> > +
> > +	WARN_ON(timekeeping_suspended);
> > +
> > +	do {
> > +		seq = read_seqcount_begin(&tk_core.seq);
> > +
> > +		ts->tv_sec = tk->xtime_sec;
> > +		mono = tk->tkr_mono.base;
> > +		nsecs = timekeeping_get_ns(&tk->tkr_mono);
> > +		offset = *offsets[TK_OFFS_REAL];
> > +	} while (read_seqcount_retry(&tk_core.seq, seq));
> > +
> > +	mono = ktime_add_ns(mono, nsecs);
> > +	if (atomic64_try_cmpxchg(&mg_floor, &old, mono)) {
> > +		ts->tv_nsec = 0;
> > +		timespec64_add_ns(ts, nsecs);
> > +	} else {
> > +		*ts = ktime_to_timespec64(ktime_add(old, offset));
> > +	}
> > +
> > +}
> > +EXPORT_SYMBOL(ktime_get_real_ts64_mg);
> > +

-- 
Jeff Layton <jlayton@...nel.org>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ