lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGM2reaEGOjVEFu3pDHdRfMFjRNvaLZxTncao3rAgpEijKLb7g@mail.gmail.com>
Date:   Tue, 19 Jun 2018 17:25:56 -0400
From:   Pavel Tatashin <pasha.tatashin@...cle.com>
To:     tglx@...utronix.de
Cc:     Steven Sistare <steven.sistare@...cle.com>,
        Daniel Jordan <daniel.m.jordan@...cle.com>,
        linux@...linux.org.uk, schwidefsky@...ibm.com,
        Heiko Carstens <heiko.carstens@...ibm.com>,
        John Stultz <john.stultz@...aro.org>, sboyd@...eaurora.org,
        x86@...nel.org, LKML <linux-kernel@...r.kernel.org>,
        mingo@...hat.com, hpa@...or.com, douly.fnst@...fujitsu.com,
        peterz@...radead.org, prarit@...hat.com, feng.tang@...el.com,
        Petr Mladek <pmladek@...e.com>, gnomes@...rguk.ukuu.org.uk
Subject: Re: [PATCH v10 2/7] time: sync read_boot_clock64() with persistent clock

On Tue, Jun 19, 2018 at 5:17 PM Thomas Gleixner <tglx@...utronix.de> wrote:
>
> On Fri, 15 Jun 2018, Pavel Tatashin wrote:
>
> > read_boot_clock64() returns a boot start timestamp from epoch. Some arches
> > may need to access the persistent clock interface in order to calculate the
> > epoch offset. The resolution of the persistent clock, however, might be
> > low.  Therefore, in order to avoid time discrepancies a new argument 'now'
> > is added to read_boot_clock64() parameters. Arch may decide to use it
> > instead of accessing persistent clock again.
>
> I kinda know what you are trying to say, but it's all but obvious.
>
> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> > index 4786df904c22..fb6898fab374 100644
> > --- a/kernel/time/timekeeping.c
> > +++ b/kernel/time/timekeeping.c
> > @@ -1502,9 +1502,13 @@ void __weak read_persistent_clock64(struct timespec64 *ts64)
> >   * Function to read the exact time the system has been started.
> >   * Returns a timespec64 with tv_sec=0 and tv_nsec=0 if unsupported.
> >   *
> > + * Argument 'now' contains time from persistent clock to calculate offset from
> > + * epoch. May contain zeros if persist ant clock is not available.
>
> What's a 'persist ant' ?

persistent, but I think spell checker decided that I was writing about
some ants. :)

>
> > + *
> >   *  XXX - Do be sure to remove it once all arches implement it.
> >   */
> > -void __weak read_boot_clock64(struct timespec64 *ts)
> > +void __weak __init read_boot_clock64(struct timespec64 *now,
> > +                                  struct timespec64 *ts)
> >  {
> >       ts->tv_sec = 0;
> >       ts->tv_nsec = 0;
> > @@ -1535,7 +1539,7 @@ void __init timekeeping_init(void)
> >       } else if (now.tv_sec || now.tv_nsec)
> >               persistent_clock_exists = true;
> >
> > -     read_boot_clock64(&boot);
> > +     read_boot_clock64(&now, &boot);
>
> This interface was already bolted on and this extension just makes it
> worse. If 'now' is invalid then you need the same checks as after
> read_persistent_clock() replicated along with conditionals of some sort.
>
> 'boot' time is used to adjust the wall to monotonic offset. So looking at
> the math behind all of that:
>
>    read_persistent_clock(&now);
>    read_boot_clock(&boot);
>
>    tk_set_xtime(tk, now)
>      tk->xtime_sec = now.sec;
>      tk->xtime_nsec = now.nsec;
>
>   tk_set_wall_to_mono(tk, -boot);
>      tk->wall_to_mono = boot;
>      tk->offs_real = -boot;
>
>   timekeeping_update(tk)
>      tk->tkr_mono = tk->xtime + tk->wall_to_mono;
>
> tkr_mono.base is guaranteed to be >= 0. So you need to make sure that
>
>      tk->xtime + tk->wall_to_mono >= 0
>
> Yet another check which you need to do in that magic function. That's just
> wrong. If this grows more users then they all have to copy the same sanity
> checks over and over and half of them will be wrong.
>
> Fortunately there is only a single real user of read_boot_clock() in the
> tree: S390. By virtue of being S390 there is no worry about any sanity
> checks. It just works.
>
> ARM has the function, but there is no single subarch which actually
> implements the thing, so this is easy to fix by removing it.
>
> So the right thing to do is the following:
>
> Provide a new function:
>
> void __weak read_persistent_wall_and_boot_offset(struct timespec64 *wall_time,
>                                                  ktime_t *boot_offset)
> {
>         read_persistent_clock(wall_time);
> }
>
> Then add the new function to S390:
>
> void read_persistent_wall_and_boot_offset(struct timespec64 *wall_time,
>                                           ktime_t *boot_offset)
> {
>         unsigned char clk[STORE_CLOCK_EXT_SIZE];
>         struct timespec64 boot_time;
>         __u64 delta;
>
>         read_persistent_clock(wall_time);
>
>         delta = initial_leap_seconds + TOD_UNIX_EPOCH;
>         memcpy(clk, tod_clock_base, 16);
>         *(__u64 *) &clk[1] -= delta;
>         if (*(__u64 *) &clk[1] > delta)
>                 clk[0]--;
>         ext_to_timespec64(clk, boot_time);
>         *boot_offset = timespec64_to_ns(timespec64_sub(wall_time, boot_time));
> }
>
> Then rework timekeeping_init():
>
> timekeeping_init()
>    struct timespec64 wall_time, wall_to_mono, offs;
>    ktime_t boot_offset = 0;
>
>    read_persistent_wall_and_boot_offset(&walltime, &boot_offset);
>
>    if (!valid(walltime))
>         boottime = wall_time.tv_sec = wall_time.tv_nsec = 0;
>    else
>         persistent_clock = true;
>
>    if (boot_offset > timespec64_to_nsec(wall_time))
>        offs.tv_sec = offs.tv_nsec = 0;
>    else
>        offs = ns_to_timespec64(boot_offset);
>
>    wall_to_mono = timespec64_sub(offs, wall_time);
>    tk_set_wall_to_mono(tk, wall_time);
>
>
> After that remove the read_boot_time() leftovers all over the place. And
> then the x86 implementation boils down to:
>
> void read_persistent_wall_and_boot_offset(struct timespec64 *wall_time,
>                                           ktime_t *boot_offset)
> {
>          x86_platform.get_wallclock(ts);
>          *boot_offset = sched_clock();
> }
>
> And doing it this way - by adding the extra math to the only architecture
> on the planet which has sane timers - is the right thing to do because all
> other architectures will have to fall back to tricks similar to x86 by
> doing clocksource/schedclock based guestimation of the time where the
> machine actually reached the kernel.
>
> Hmm?

Sounds good, I will redo this and the next patch with your suggestions.

Thank you,
Pavel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ