linux-kernel - Re: [REGRESSION] ? system is stuck in clocksource, >60s delay at boot time without tsc=unstable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANDhNCqdpbi=r81NyXVWBbB5POj5nmrc7qo3r2bi1yYqYBgiAg@mail.gmail.com>
Date: Thu, 2 Jan 2025 13:49:21 -0800
From: John Stultz <jstultz@...gle.com>
To: Fab Stz <fabstz-it@...oo.fr>
Cc: Thomas Gleixner <tglx@...utronix.de>, Daniel Lezcano <daniel.lezcano@...aro.org>, 
	Anna-Maria Behnsen <anna-maria@...utronix.de>, Frederic Weisbecker <frederic@...nel.org>, 
	linux-kernel@...r.kernel.org
Subject: Re: [REGRESSION] ? system is stuck in clocksource, >60s delay at boot
 time without tsc=unstable

On Fri, Dec 27, 2024 at 4:39 AM Fab Stz <fabstz-it@...oo.fr> wrote:
>
> Hello,
>
> It's been one month now that I sent this email. Do you have any clue on this?

Apologies you didn't get a quick response, but you didn't really cc
many people on the first one.


> Le mercredi 27 novembre 2024, 08:18:41 CET Fab Stz a écrit :
> > Hi,
> >
> > While upgrading from Debian bullseye (kernel 5.10) to bookworm (6.1) I
> > noticed that the newer kernel is at the beginning of the boot stuck for
> > more than 60 seconds.
> >
> > This is apparently related to the clocksource module. If I boot with
> > tsc=unstable there is no more delay.
> >
> > In the kernel logs, I have:
> >
> > clocksource: Long readout interval, skipping watchdog check: cs_nsec:
> > 512010551 wd_nsec: 39243763320
> > clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as
> > unstable because the skew is too large:
> > clocksource:                       'hpet' wd_nsec: 537773520 wd_now:
> > 3f0f7632 wd_last: 3e425140 mask: ffffffff
> > clocksource:                       'tsc' cs_nsec: 511996079 cs_now:
> > 18b0866e6a cs_last: 185f8d68ba mask: ffffffffffffffff
> > clocksource:                       'tsc' is current clocksource.
> > tsc: Marking TSC unstable due to clocksource watchdog
> > TSC found unstable after boot, most likely due to broken BIOS. Use
> > 'tsc=unstable'.
> > sched_clock: Marking unstable (3765559657, 1276001)<-(3775071370, -8235646)
> > clocksource: Checking clocksource tsc synchronization from CPU 1 to CPUs 0.
> > clocksource: Switched to clocksource hpet
> >
> >
> > I already had such a warning with 5.10, but there was no >60sec freeze
> > with it like with 6.1

So, it sounds like your TSC stalls in idle (likely missing
X86_FEATURE_NONSTOP_TSC), and probably something between 5.10 and 6.1
added a sleep which causes the stall before the clocksource watchdog
can check and disable the TSC on its own.

The kernel is telling you tsc=unstable is the way to go here, and it
seems that is working for you.  From my first glance, I'd not call
this a regression, as the kernel was warning you about the problematic
hardware before, and it was most likely just luck that it was able to
auto-detect the problem before there were any negative results.

That said, if you're still curious, you might try bisecting kernel
versions between 5.10 and 6.1 to see which commit might have caused
the change in behavior.

thanks
-john