[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aSh5cCg5TQOh-qUw@pathway.suse.cz>
Date: Thu, 27 Nov 2025 17:16:48 +0100
From: Petr Mladek <pmladek@...e.com>
To: "Bird, Tim" <Tim.Bird@...y.com>
Cc: Geert Uytterhoeven <geert@...ux-m68k.org>,
Francesco Valla <francesco@...la.it>, Tim Bird <tim@...d.org>,
Steve Rostedt <rostedt@...dmis.org>,
"john.ogness@...utronix.de" <john.ogness@...utronix.de>,
"senozhatsky@...omium.org" <senozhatsky@...omium.org>,
Andrew Morton <akpm@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>,
Linux Embedded <linux-embedded@...r.kernel.org>
Subject: Re: [PATCH] printk: add early_counter_ns routine for printk blind
spot
On Thu 2025-11-27 00:16:23, Bird, Tim wrote:
>
>
> > -----Original Message-----
> > From: Geert Uytterhoeven <geert@...ux-m68k.org>
> > Hi all,
> >
> > On Wed, 26 Nov 2025 at 03:24, Francesco Valla <francesco@...la.it> wrote:
> > > On Mon, Nov 24, 2025 at 10:30:52PM -0700, Tim Bird wrote:
> > > > From: Tim Bird <tim.bird@...y.com>
> > > >
> > > > During early boot, printk timestamps are reported as zero,
> > > > which creates a blind spot in early boot timings. This blind
> > > > spot hinders timing and optimization efforts for code that
> > > > executes before time_init(), which is when local_clock() is
> > > > initialized sufficiently to start returning non-zero timestamps.
> > > > This period is about 400 milliseconds for many current desktop
> > > > and embedded machines running Linux.
> > > >
> > > > Add an early_counter_ns function that returns nanosecond
> > > > timestamps based on get_cycles(). get_cycles() is operational
> > > > on arm64 and x86_64 from kernel start. Add some calibration
> > > > printks to allow setting configuration variables that are used
> > > > to convert cycle counts to nanoseconds (which are then used
> > > > in early printks). Add CONFIG_EARLY_COUNTER_NS, as well as
> > > > some associated conversion variables, as new kernel config
> > > > variables.
> > > >
> > > > After proper configuration, this yields non-zero timestamps for
> > > > printks from the very start of kernel execution. The timestamps
> > > > are relative to the start of the architecture-specific counter
> > > > used in get_cycles (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> > > > This means that the time reported reflects time-from-power-on for
> > > > most embedded products. This is also a useful data point for
> > > > boot-time optimization work.
> > > >
> > > > Note that there is a discontinuity in the timestamp sequencing
> > > > when standard clocks are finally initialized in time_init().
> > > > The printk timestamps are thus not monotonically increasing
> > > > through the entire boot.
> > >
> > > This is... not going to work, IMO, and might lead to breakages in
> > > userspace tools (are printk timings a userspace API?).
> >
> > I think they are.
> >
> > Another approach would be to defer the calibration/conversion to
> > userspace, and make sure the early part stands out.
> > I.e. when real timekeeping is available, kernel messages are prefixed by
> > "[%5lu.%06lu]". Early messages could be prefixed by a plain integer
> > "[%12u]", containing the raw cycle counter value.
> > The presence of the decimal point would make the difference obvious.
>
> I thought about this while I was creating this.
> It wouldn't require the extra configuration for MULT and SHIFT (which would be nice),
> and it would be, as you say, very obvious that this was not a regular timestamp.
> This means it could be enabled on a generic kernel (making more likely it could be
> enabled by default). And really only boot-time optimizers would care enough to
> decode the data, so the onus would be on them to run the tool. Everyone else
> could ignore them.
>
> I'm not sure if it would break existing printk-processing tools. I suspect it would.
I guess that it might break even basic tools, like dmesg, journalctl,
or crash.
A solution might be to pass it as an extra information to the official
timestamp, for example:
+ on console:
<level>[timestamp][callerid][cl cycles] message
<6>[ 0.000000][ T0][cl 345678] BIOS-provided physical RAM map:
<6>[ 0.000000][ T0][cl 1036890] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
<6>[ 0.000000][ T0][cl 1129452] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
+ via /dev/kmsg
<level>,<sequnum>,<timestamp>,<contflag>[,additional_values, ... ];<message text>
6,2,0,-,caller=T0,cycle=345678;BIOS-provided physical RAM map:
6,3,0,-,caller=T0,cycle=1036890;BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
6,4,0,-,caller=T0,cycle=1129452;BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
The extra field would disappear after time_init().
The value might be stored in struct printk_info in the same field .ts_nsec.
It might be distinguished from a real timestamp using a flag in
enum printk_info_flags. The official timestamp would be zero when
this flag is set.
It will not require the two CONFIG_ values for calibrating the
computation.
The output on the console is a bit messy. But I guess that this
feature is rather for tuning and it won't be enabled on production
systems. So it might be acceptable.
time_init() might even print a message with the cycle value
and the official timestamp on the same line. It can be used
for post-processing and translating cycles back to ns.
> Also, I find that post-processing tools often get overlooked.
> I asked at ELC this year how many people are using show_delta, which
> has been upstream for years, and can do a few neat things with printk timestamps,
> and not a single person had even heard of it.
Yeah. We need to make sure that the post processing tool won't get mad,
for example, crash or show garbage.
Best Regards,
Petr
Powered by blists - more mailing lists