linux-kernel - Re: [PATCH] printk: add early_counter

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aSh5cCg5TQOh-qUw@pathway.suse.cz>
Date: Thu, 27 Nov 2025 17:16:48 +0100
From: Petr Mladek <pmladek@...e.com>
To: "Bird, Tim" <Tim.Bird@...y.com>
Cc: Geert Uytterhoeven <geert@...ux-m68k.org>,
	Francesco Valla <francesco@...la.it>, Tim Bird <tim@...d.org>,
	Steve Rostedt <rostedt@...dmis.org>,
	"john.ogness@...utronix.de" <john.ogness@...utronix.de>,
	"senozhatsky@...omium.org" <senozhatsky@...omium.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Linux Embedded <linux-embedded@...r.kernel.org>
Subject: Re: [PATCH] printk: add early_counter_ns routine for printk blind
 spot

On Thu 2025-11-27 00:16:23, Bird, Tim wrote:
> 
> 
> > -----Original Message-----
> > From: Geert Uytterhoeven <geert@...ux-m68k.org>
> > Hi all,
> > 
> > On Wed, 26 Nov 2025 at 03:24, Francesco Valla <francesco@...la.it> wrote:
> > > On Mon, Nov 24, 2025 at 10:30:52PM -0700, Tim Bird wrote:
> > > > From: Tim Bird <tim.bird@...y.com>
> > > >
> > > > During early boot, printk timestamps are reported as zero,
> > > > which creates a blind spot in early boot timings.  This blind
> > > > spot hinders timing and optimization efforts for code that
> > > > executes before time_init(), which is when local_clock() is
> > > > initialized sufficiently to start returning non-zero timestamps.
> > > > This period is about 400 milliseconds for many current desktop
> > > > and embedded machines running Linux.
> > > >
> > > > Add an early_counter_ns function that returns nanosecond
> > > > timestamps based on get_cycles().  get_cycles() is operational
> > > > on arm64 and x86_64 from kernel start.  Add some calibration
> > > > printks to allow setting configuration variables that are used
> > > > to convert cycle counts to nanoseconds (which are then used
> > > > in early printks).  Add CONFIG_EARLY_COUNTER_NS, as well as
> > > > some associated conversion variables, as new kernel config
> > > > variables.
> > > >
> > > > After proper configuration, this yields non-zero timestamps for
> > > > printks from the very start of kernel execution.  The timestamps
> > > > are relative to the start of the architecture-specific counter
> > > > used in get_cycles (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> > > > This means that the time reported reflects time-from-power-on for
> > > > most embedded products.  This is also a useful data point for
> > > > boot-time optimization work.
> > > >
> > > > Note that there is a discontinuity in the timestamp sequencing
> > > > when standard clocks are finally initialized in time_init().
> > > > The printk timestamps are thus not monotonically increasing
> > > > through the entire boot.
> > >
> > > This is... not going to work, IMO, and might lead to breakages in
> > > userspace tools (are printk timings a userspace API?).
> > 
> > I think they are.
> > 
> > Another approach would be to defer the calibration/conversion to
> > userspace, and make sure the early part stands out.
> > I.e. when real timekeeping is available, kernel messages are prefixed by
> > "[%5lu.%06lu]".  Early messages could be prefixed by a plain integer
> > "[%12u]", containing the raw cycle counter value.
> > The presence of the decimal point would make the difference obvious.
> 
> I thought about this while I was creating this.
> It wouldn't require the extra configuration for MULT and SHIFT (which would be nice),
> and it would be, as you say, very obvious that this was not a regular timestamp. 
> This means it could be enabled on a generic kernel (making more likely it could be
> enabled by default). And really only boot-time optimizers would care enough to
> decode the data, so the onus would be on them to run the tool.  Everyone else
> could ignore them.
> 
> I'm not sure if it would break existing printk-processing tools.  I suspect it would.

I guess that it might break even basic tools, like dmesg, journalctl,
or crash.

A solution might be to pass it as an extra information to the official
timestamp, for example:

  + on console:

      <level>[timestamp][callerid][cl cycles] message
      <6>[    0.000000][    T0][cl  345678] BIOS-provided physical RAM map:
      <6>[    0.000000][    T0][cl 1036890] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
      <6>[    0.000000][    T0][cl 1129452] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved

  + via /dev/kmsg

     <level>,<sequnum>,<timestamp>,<contflag>[,additional_values, ... ];<message text>
     6,2,0,-,caller=T0,cycle=345678;BIOS-provided physical RAM map:
     6,3,0,-,caller=T0,cycle=1036890;BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
     6,4,0,-,caller=T0,cycle=1129452;BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved

The extra field would disappear after time_init().

The value might be stored in struct printk_info in the same field .ts_nsec.
It might be distinguished from a real timestamp using a flag in
enum printk_info_flags. The official timestamp would be zero when
this flag is set.

It will not require the two CONFIG_ values for calibrating the
computation.

The output on the console is a bit messy. But I guess that this
feature is rather for tuning and it won't be enabled on production
systems. So it might be acceptable.

time_init() might even print a message with the cycle value
and the official timestamp on the same line. It can be used
for post-processing and translating cycles back to ns.

> Also,  I find that post-processing tools often get overlooked.
> I asked at ELC this year how many people are using show_delta, which 
> has been upstream for years, and can do a few neat things with printk timestamps,
> and not a single person had even heard of it.

Yeah. We need to make sure that the post processing tool won't get mad,
for example, crash or show garbage.

Best Regards,
Petr