lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANDhNCrpkTfe6BRVNf1ihhGALbPBBhOs1PCPxA4MDHa1+=sEbQ@mail.gmail.com>
Date: Thu, 12 Sep 2024 13:11:31 -0700
From: John Stultz <jstultz@...gle.com>
To: Jeff Layton <jlayton@...nel.org>
Cc: Alexander Viro <viro@...iv.linux.org.uk>, Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>, 
	Thomas Gleixner <tglx@...utronix.de>, Stephen Boyd <sboyd@...nel.org>, Arnd Bergmann <arnd@...nel.org>, 
	Vadim Fedorenko <vadim.fedorenko@...ux.dev>, linux-fsdevel@...r.kernel.org, 
	linux-kernel@...r.kernel.org, kernel test robot <oliver.sang@...el.com>
Subject: Re: [PATCH v2] timekeeping: move multigrain timestamp floor handling
 into timekeeper

On Thu, Sep 12, 2024 at 11:02 AM Jeff Layton <jlayton@...nel.org> wrote:
>
> The kernel test robot reported a performance hit in some will-it-scale
> tests due to the multigrain timestamp patches.  My own testing showed
> about a 7% drop in performance on the pipe1_threads test, and the data
> showed that coarse_ctime() was slowing down current_time().

So, you provided some useful detail about why coarse_ctime() was slow
in your reply earlier, but it would be good to preserve that in the
commit message here.


> Move the multigrain timestamp floor tracking word into timekeeper.c. Add
> two new public interfaces: The first fills a timespec64 with the later
> of the coarse-grained clock and the floor time, and the second gets a
> fine-grained time and tries to swap it into the floor and fills a
> timespec64 with the result.
>
> The first function returns an opaque cookie that is suitable for passing
> to the second, which will use it as the "old" value in the cmpxchg.

The cookie usage isn't totally clear to me right off.  It feels a bit
more subtle then I'd expect.


> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 5391e4167d60..bb039c9d525e 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -114,6 +114,13 @@ static struct tk_fast tk_fast_raw  ____cacheline_aligned = {
>         .base[1] = FAST_TK_INIT,
>  };
>
> +/*
> + * This represents the latest fine-grained time that we have handed out as a
> + * timestamp on the system. Tracked as a monotonic ktime_t, and converted to the
> + * realtime clock on an as-needed basis.
> + */
> +static __cacheline_aligned_in_smp atomic64_t mg_floor;
> +

So I do really like this general approach of having an internal floor
value combined with special coarse/fine grained assessors that work
with the floor, so we're not impacting the normal hotpath logic
(basically I was writing up a suggestion to this effect to the thread
with Arnd when I realized you had follow up patch in my inbox).


>  static inline void tk_normalize_xtime(struct timekeeper *tk)
>  {
>         while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_mono.shift)) {
> @@ -2394,6 +2401,76 @@ void ktime_get_coarse_real_ts64(struct timespec64 *ts)
>  }
>  EXPORT_SYMBOL(ktime_get_coarse_real_ts64);
>
> +/**
> + * ktime_get_coarse_real_ts64_mg - get later of coarse grained time or floor
> + * @ts: timespec64 to be filled
> + *
> + * Adjust floor to realtime and compare it to the coarse time. Fill
> + * @ts with the latest one. Returns opaque cookie suitable to pass
> + * to ktime_get_real_ts64_mg.
> + */
> +u64 ktime_get_coarse_real_ts64_mg(struct timespec64 *ts)
> +{
> +       struct timekeeper *tk = &tk_core.timekeeper;
> +       u64 floor = atomic64_read(&mg_floor);
> +       ktime_t f_real, offset, coarse;
> +       unsigned int seq;
> +
> +       WARN_ON(timekeeping_suspended);
> +
> +       do {
> +               seq = read_seqcount_begin(&tk_core.seq);
> +               *ts = tk_xtime(tk);
> +               offset = *offsets[TK_OFFS_REAL];
> +       } while (read_seqcount_retry(&tk_core.seq, seq));
> +
> +       coarse = timespec64_to_ktime(*ts);
> +       f_real = ktime_add(floor, offset);
> +       if (ktime_after(f_real, coarse))
> +               *ts = ktime_to_timespec64(f_real);
> +       return floor;
> +}
> +EXPORT_SYMBOL_GPL(ktime_get_coarse_real_ts64_mg);

Generally this looks ok to me.


> +/**
> + * ktime_get_real_ts64_mg - attempt to update floor value and return result
> + * @ts:                pointer to the timespec to be set
> + * @cookie:    opaque cookie from earlier call to ktime_get_coarse_real_ts64_mg()
> + *
> + * Get a current monotonic fine-grained time value and attempt to swap
> + * it into the floor using @cookie as the "old" value. @ts will be
> + * filled with the resulting floor value, regardless of the outcome of
> + * the swap.
> + */

Again this cookie argument usage and the behavior of this function
isn't very clear to me.

> +void ktime_get_real_ts64_mg(struct timespec64 *ts, u64 cookie)
> +{
> +       struct timekeeper *tk = &tk_core.timekeeper;
> +       ktime_t offset, mono, old = (ktime_t)cookie;
> +       unsigned int seq;
> +       u64 nsecs;
> +
> +       WARN_ON(timekeeping_suspended);
> +
> +       do {
> +               seq = read_seqcount_begin(&tk_core.seq);
> +
> +               ts->tv_sec = tk->xtime_sec;
> +               mono = tk->tkr_mono.base;
> +               nsecs = timekeeping_get_ns(&tk->tkr_mono);
> +               offset = *offsets[TK_OFFS_REAL];
> +       } while (read_seqcount_retry(&tk_core.seq, seq));
> +
> +       mono = ktime_add_ns(mono, nsecs);
> +       if (atomic64_try_cmpxchg(&mg_floor, &old, mono)) {
> +               ts->tv_nsec = 0;
> +               timespec64_add_ns(ts, nsecs);
> +       } else {
> +               *ts = ktime_to_timespec64(ktime_add(old, offset));
> +       }
> +
> +}
> +EXPORT_SYMBOL(ktime_get_real_ts64_mg);


So initially I was expecting this to look something like (sorry for
the whitespace damage here):
{
    do {
        seq = read_seqcount_begin(&tk_core.seq);
        ts->tv_sec = tk->xtime_sec;
        mono = tk->tkr_mono.base;
        nsecs = timekeeping_get_ns(&tk->tkr_mono);
        offset = *offsets[TK_OFFS_REAL];
    } while (read_seqcount_retry(&tk_core.seq, seq));

    mono = ktime_add_ns(mono, nsecs);
    do {
        old = atomic64_read(&mg_floor);
        if (floor >= mono)
            break;
    } while(!atomic64_try_cmpxchg(&mg_floor, old, mono);
    ts->tv_nsec = 0;
    timespec64_add_ns(ts, nsecs);
}

Where you read the tk data, atomically update the floor (assuming it's
not in the future) and then return the finegrained value, not needing
to manage a cookie value.

But instead, it seems like if something has happened since the cookie
value was saved (another cpu getting a fine grained timestamp), your
ktime_get_real_ts64_mg() will fall back to returning the same coarse
grained time saved to the cookie, as if no time had past?

It seems like that could cause problems:

cpu1                                     cpu2
--------------------------------------------------------------------------
                                         t2a = ktime_get_coarse_real_ts64_mg
t1a = ktime_get_coarse_real_ts64_mg()
t1b = ktime_get_real_ts64_mg(t1a)

                                         t2b = ktime_get_real_ts64_mg(t2a)

Where t2b will seem to be before t1b, even though it happened afterwards.


thanks
-john

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ