lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1711142044110.2221@nanos>
Date:   Tue, 14 Nov 2017 21:21:48 +0100 (CET)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
cc:     Petr Mladek <pmladek@...e.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mark Salyzyn <salyzyn@...roid.com>,
        Prarit Bhargava <prarit@...hat.com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [GIT pull] printk updates for 4.15

On Tue, 14 Nov 2017, Linus Torvalds wrote:
>
> But that really means that you can't just change what "ts_nsec" means
> - it's exported as "microseconds since boot".

That's true if sched clock is stable and high resolution and does not stop
during suspend.

So in reality this clock has random meanings depending on the hardware
properties. Really well "designed".

> Now, the important thing to keep in mind is that the kernel is not the
> _only_ source of these "_SOURCE_MONOTONIC_TIMESTAMP" things.
> 
> In other places, systemd generates them itself, with
> 
>         ts->monotonic = now(CLOCK_MONOTONIC);
> 
> where "now()" ends up using clock_gettime().
> 
> End result? The kernel /dev/kmsg interface has to be at least
> "similar" to returning usec's comparable to CLOCK_MONOTONIC.

First of all sched_clock() and CLOCK_MONOTONIC can drift apart frequency
wise pretty bad.

Secondly, on a machine with non stop TSC sched_clock() advances by the time
the system spent in suspend, while CLOCK_MONOTONIC does not for historical
reasons. CLOCK_BOOTTIME does because that accounts for the time spent in
suspend, but then again using BOOTTIME for the user space time stamps does
not work on systems where sched_clock() stops in suspend.

So in terms of local machine correlation the only reasonable choice is
CLOCK_MONOTONIC especially when existing user space already tries to
correlate time via clock_gettime(CLOCK_MONOTONIC) based time stamps.

This works on all systems no matter which underlying hardware is used to
retrieve the timestamp. dmesg is not really a performance critical path, so
the few extra cycles which this might take more than using
local/sched_clock() are not an issue.

> This is why I would suggest just keeping the existing clock, and then
> adding clock events at _significant_ points. Somebody brought up the
> complete red herring of "every five minutes will fiill up the system
> log". That's complete BS. You do it when people change the system
> clock (and perhaps when NTP synchronizes), or after a suspend event,
> or maybe regularly every few _hours_ (and make that last part
> configurable).

I'm not worried about the system log spam, I'm more worried that we lose
exactly that last synchronization line due to buffer wrap around.

One possible solution which was discussed before is to have that extra
information in every dmesg timestamp. That means storing CLOCK_MONOTONIC as
the default timestamp and auxillary offsets for CLOCK_BOOTTIME and
CLOCK_REALTIME. Thats simple enough to achieve as we can access that
information via the NMI safe time accessors. Needs a new function, but
thats trivial.

The interesting question is how to print out that information. We can do it
unconditionally, but that might confuse existing tools. A command line
option and a sysfs knob to enable the printout of these extra bits might
make this work.

Thanks,

	tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ