lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <MW5PR13MB5632FE304C7FA3D86309D6F1FD93A@MW5PR13MB5632.namprd13.prod.outlook.com>
Date: Mon, 26 Jan 2026 16:52:57 +0000
From: "Bird, Tim" <Tim.Bird@...y.com>
To: Francesco Valla <francesco@...la.it>
CC: "pmladek@...e.com" <pmladek@...e.com>,
        "rostedt@...dmis.org"
	<rostedt@...dmis.org>,
        "john.ogness@...uxtronix.de"
	<john.ogness@...uxtronix.de>,
        "senozhatsky@...omium.org"
	<senozhatsky@...omium.org>,
        "linux-embedded@...r.kernel.org"
	<linux-embedded@...r.kernel.org>,
        "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v2] printk: fix zero-valued printk timestamps in early
 boot



> -----Original Message-----
> From: Francesco Valla <francesco@...la.it>
> 
> Hi Tim,
> 
> I tested this both on X86_64 QEMU and on a i.MX93 (ARM64) and can
> confirm it is working as expected. Auto-calc of calibration data is far
> better than the configuration parameters in v1.
> 
> It is slightly confusing to see a time value printed to serial output
> and another one inside kmsg, but that's a human thing and should not
> confuse any tool.
Agreed.  I wasn't too worried about it, because most serious developers working
on boot-time will not be watching early messages over serial console.  (Usually they
use 'quiet' or some lower log level).  But on qemu, it does look strange to see 0s
on the first output sequence, and then non-zeroes when using dmesg later in the same
boot.

I just realized though, that I should go back and see if there's a discontinuity on the output via serial
(before and after calibration), and possibly put a note about that in the config description.

I'll think about what I can do here to reduce the confusion.

> 
> Some notes follow.
> 
> On Sat, Jan 24, 2026 at 12:40:27PM -0700, Tim Bird wrote:
> > During early boot, printk timestamps are reported as zero before
> > kernel timekeeping starts (e.g. before time_init()).  This
> > hinders boot-time optimization efforts.  This period is about 400
> > milliseconds for many current desktop and embedded machines
> > running Linux.
> >
> > Add support to save cycles during early boot, and output correct
> > timestamp values after timekeeping is initialized.  get_cycles()
> > is operational on arm64 and x86_64 from kernel start.  Add code
> > and variables to save calibration values used to later convert
> > cycle counts to time values in the early printks.  Add a config
> > to control the feature.
> >
> > This yields non-zero timestamps for printks from the very start
> > of kernel execution.  The timestamps are relative to the start of
> > the architecture-specified counter used in get_cycles
> > (e.g. the TSC on x86_64 and cntvct_el0 on arm64).
> >
> > All timestamps reflect time from power-on instead of time from
> > the kernel's timekeeping initialization.
> >
> > Signed-off-by: Tim Bird <tim.bird@...y.com>
> > ---
> > V1 -> V2
> >   Remove calibration CONFIG vars
> >   Add 'depends on' to restrict arches (to handle ppc bug)
> >   Add early_ts_offset to avoid discontinuity
> >   Save cycles in ts_nsec, and convert on output
> >   Move conditional code to include file (early_times.h)
> > ---
> >  include/linux/early_times.h | 48 +++++++++++++++++++++++++++++++++++++
> >  init/Kconfig                | 12 ++++++++++
> >  init/main.c                 | 26 ++++++++++++++++++++
> >  kernel/printk/printk.c      | 16 +++++++++++--
> >  4 files changed, 100 insertions(+), 2 deletions(-)
> >  create mode 100644 include/linux/early_times.h
> >
> > diff --git a/include/linux/early_times.h b/include/linux/early_times.h
> > new file mode 100644
> > index 000000000000..9dc31eb442c2
> > --- /dev/null
> > +++ b/include/linux/early_times.h
> > @@ -0,0 +1,48 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _KERNEL_PRINTK_EARLY_TIMES_H
> > +#define _KERNEL_PRINTK_EARLY_TIMES_H
> > +
> > +#include <linux/timex.h>
> > +
> > +#if defined(CONFIG_EARLY_PRINTK_TIMES)
> > +extern u32 early_mult, early_shift;
> > +extern u64 early_ts_offset;
> > +
> > +static inline u64 early_cycles(void)
> > +{
> > +	return ((u64)get_cycles() | (1ULL << 63));
> > +}
> > +
> > +static inline u64 adjust_early_ts(u64 ts)
> > +{
> > +	/* High bit means ts is a cycle count */
> > +	if (unlikely(ts & (1ULL << 63)))
> > +		/*
> > +		 * mask high bit and convert to ns
> > +		 * Note that early_mult may be 0, but that's OK because
> > +		 * we'll just multiply by 0 and return 0. This will
> > +		 * only occur if we're outputting a printk message
> > +		 * before the calibration of the early timestamp.
> > +		 * Any output after user space start (eg. from dmesg or
> > +		 * journalctl) will show correct values.
> > +		 */
> > +		return (((ts & ~(1ULL << 63)) * early_mult) >> early_shift);
> > +
> > +	/* If timestamp is already in ns, just add offset */
> > +	return ts + early_ts_offset;
> > +}
> > +#else
> > +static inline u64 early_cycles(void)
> > +{
> > +	return 0;
> > +}
> > +
> > +static inline u64 adjust_early_ts(u64 ts)
> > +{
> > +	return ts;
> > +}
> > +#endif /* CONFIG_EARLY_PRINTK_TIMES */
> > +
> > +#endif /* _KERNEL_PRINTK_EARLY_TIMES_H */
> > +
> > diff --git a/init/Kconfig b/init/Kconfig
> > index fa79feb8fe57..060a22cddd17 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -777,6 +777,18 @@ config IKHEADERS
> >  	  or similar programs.  If you build the headers as a module, a module called
> >  	  kheaders.ko is built which can be loaded on-demand to get access to headers.
> >
> > +config EARLY_PRINTK_TIMES
> > +	bool "Show non-zero printk timestamps early in boot"
> > +	default y
> 
> Considering that this might have a significant impact on monitoring
> mechanisms already in place (that e.g. expect a specific dmesg print to
> have a maximum associated time value), please consider a N default here.

Oops!  Sorry, that was supposed to be 'default n'.  You're right.  I know I had
this as default N, and I think I switched it temporarily for testing, and forgot
to switch it back (and never caught it the numerous times I reviewed the
patch before sending it out again, ugh).  Thanks for catching this.

If people like this, and we don't see any problems with tooling or virtualization, I
could see it switching to default Y in the future.  But for now this should definitely
be 'default n'.

> 
> > +	depends on PRINTK
> > +	depends on ARM64 || X86_64
> > +	help
> > +	  Use a cycle-counter to provide printk timestamps during
> > +	  early boot.  This allows seeing timestamps for printks that
> > +	  would otherwise show as 0.  Note that this will shift the
> > +	  printk timestamps to be relative to machine power on, instead
> > +	  of relative to the start of kernel timekeeping.
> > +
> 
> To be precise, the timestamps will be relative to processor power on;
> the machine might have some other processors that run before the Linux
> one (this is the case for example of i.MX9 or AM62 SoCs) and will be
> unaccounted for even by this mechanism.

Good point.   Even more precisely, it will be relative to
cycle-counter value initialization or reset, which often (but not always)
corresponds to processor power on.

I'll adjust the wording.

I'm still a bit unsure what happens in the virtualization case.  qemu seems to initialize
the TSC at qemu start, but I'm not sure what happens for e.g. client VMs on cloud servers.

> 
> >  config LOG_BUF_SHIFT
> >  	int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
> >  	range 12 25
> > diff --git a/init/main.c b/init/main.c
> > index b84818ad9685..cc1af26933f7 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -104,6 +104,9 @@
> >  #include <linux/pidfs.h>
> >  #include <linux/ptdump.h>
> >  #include <linux/time_namespace.h>
> > +#include <linux/timex.h>
> > +#include <linux/sched/clock.h>
> > +#include <linux/early_times.h>
> >  #include <net/net_namespace.h>
> >
> >  #include <asm/io.h>
> > @@ -160,6 +163,10 @@ static size_t initargs_offs;
> >  # define initargs_offs 0
> >  #endif
> >
> > +#ifdef CONFIG_EARLY_PRINTK_TIMES
> > +static u64 start_cycles, start_ns;
> > +#endif
> > +
> >  static char *execute_command;
> >  static char *ramdisk_execute_command = "/init";
> >
> > @@ -1118,6 +1125,11 @@ void start_kernel(void)
> >  	timekeeping_init();
> >  	time_init();
> >
> > +#ifdef CONFIG_EARLY_PRINTK_TIMES
> > +	start_cycles = get_cycles();
> > +	start_ns = local_clock();
> > +#endif
> > +
> 
> I was wondering it it wouldn't make more sense to move this logic to its
> own file, and have a plain call e.g. to early_times_init() here
> (continue...)
> 
> >  	/* This must be after timekeeping is initialized */
> >  	random_init();
> >
> > @@ -1600,6 +1612,20 @@ static int __ref kernel_init(void *unused)
> >
> >  	do_sysctl_args();
> >
> > +#ifdef CONFIG_EARLY_PRINTK_TIMES
> > +	u64 end_cycles, end_ns;
> > +
> > +	/* set calibration data for early_printk_times */
> > +	end_cycles = get_cycles();
> > +	end_ns = local_clock();
> > +	clocks_calc_mult_shift(&early_mult, &early_shift,
> > +		((end_cycles - start_cycles) * NSEC_PER_SEC)/(end_ns - start_ns),
> > +		NSEC_PER_SEC, 50);
> > +	early_ts_offset = ((start_cycles * early_mult) >> early_shift) - start_ns;
> > +	pr_debug("Early printk times: mult=%u, shift=%u, offset=%llu ns\n",
> > +		early_mult, early_shift, early_ts_offset);
> > +#endif
> > +
> 
> (...continue) and to early_times_calc() or something like that here.
> 
> In this way, all related variables (i.e.: start_cycles, start_ns from
> this file, but also early_mult, early_shift, and early_ts_offset from
> kernel/printk/printk.c) can be confined to their own file and not add
> noise here.

I thought a lot about this, and wasn't sure whether to use that approach or
not.  I would need two routines: early_times_start_calibration, and
early_times_finish_calibration().  I agree with the idea of keeping
all variables in the printk module, so this is more self-contained.  It also
reduces the number of visible #ifdefs.

I think I'll go ahead and do this, and see what the code looks like. I suspect
that what you recommend is the better way to go.

> 
> >  	if (ramdisk_execute_command) {
> >  		ret = run_init_process(ramdisk_execute_command);
> >  		if (!ret)
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 1d765ad242b8..f17877337735 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -46,6 +46,7 @@
> >  #include <linux/ctype.h>
> >  #include <linux/uio.h>
> >  #include <linux/sched/clock.h>
> > +#include <linux/early_times.h>
> >  #include <linux/sched/debug.h>
> >  #include <linux/sched/task_stack.h>
> >  #include <linux/panic.h>
> > @@ -75,6 +76,11 @@ EXPORT_SYMBOL(ignore_console_lock_warning);
> >
> >  EXPORT_TRACEPOINT_SYMBOL_GPL(console);
> >
> > +#ifdef CONFIG_EARLY_PRINTK_TIMES
> > +u32 early_mult, early_shift;
> > +u64 early_ts_offset;
> > +#endif
> > +
> >  /*
> >   * Low level drivers may need that to know if they can schedule in
> >   * their unblank() callback or not. So let's export it.
> > @@ -639,7 +645,7 @@ static void append_char(char **pp, char *e, char c)
> >  static ssize_t info_print_ext_header(char *buf, size_t size,
> >  				     struct printk_info *info)
> >  {
> > -	u64 ts_usec = info->ts_nsec;
> > +	u64 ts_usec = adjust_early_ts(info->ts_nsec);
> >  	char caller[20];
> >  #ifdef CONFIG_PRINTK_CALLER
> >  	u32 id = info->caller_id;
> > @@ -1352,7 +1358,11 @@ static size_t print_syslog(unsigned int level, char *buf)
> >
> >  static size_t print_time(u64 ts, char *buf)
> >  {
> > -	unsigned long rem_nsec = do_div(ts, 1000000000);
> > +	unsigned long rem_nsec;
> > +
> > +	ts = adjust_early_ts(ts);
> > +
> > +	rem_nsec = do_div(ts, 1000000000);
> >
> >  	return sprintf(buf, "[%5lu.%06lu]",
> >  		       (unsigned long)ts, rem_nsec / 1000);
> > @@ -2242,6 +2252,8 @@ int vprintk_store(int facility, int level,
> >  	 * timestamp with respect to the caller.
> >  	 */
> >  	ts_nsec = local_clock();
> > +	if (!ts_nsec)
> > +		ts_nsec = early_cycles();
> >
> >  	caller_id = printk_caller_id();
> >
> > --
> > 2.43.0
> >
> 
> Regards,
> Francesco

Thanks very much for the review and test!!
 -- Tim

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ