lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANDhNCqdpbi=r81NyXVWBbB5POj5nmrc7qo3r2bi1yYqYBgiAg@mail.gmail.com>
Date: Thu, 2 Jan 2025 13:49:21 -0800
From: John Stultz <jstultz@...gle.com>
To: Fab Stz <fabstz-it@...oo.fr>
Cc: Thomas Gleixner <tglx@...utronix.de>, Daniel Lezcano <daniel.lezcano@...aro.org>, 
	Anna-Maria Behnsen <anna-maria@...utronix.de>, Frederic Weisbecker <frederic@...nel.org>, 
	linux-kernel@...r.kernel.org
Subject: Re: [REGRESSION] ? system is stuck in clocksource, >60s delay at boot
 time without tsc=unstable

On Fri, Dec 27, 2024 at 4:39 AM Fab Stz <fabstz-it@...oo.fr> wrote:
>
> Hello,
>
> It's been one month now that I sent this email. Do you have any clue on this?

Apologies you didn't get a quick response, but you didn't really cc
many people on the first one.


> Le mercredi 27 novembre 2024, 08:18:41 CET Fab Stz a écrit :
> > Hi,
> >
> > While upgrading from Debian bullseye (kernel 5.10) to bookworm (6.1) I
> > noticed that the newer kernel is at the beginning of the boot stuck for
> > more than 60 seconds.
> >
> > This is apparently related to the clocksource module. If I boot with
> > tsc=unstable there is no more delay.
> >
> > In the kernel logs, I have:
> >
> > clocksource: Long readout interval, skipping watchdog check: cs_nsec:
> > 512010551 wd_nsec: 39243763320
> > clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as
> > unstable because the skew is too large:
> > clocksource:                       'hpet' wd_nsec: 537773520 wd_now:
> > 3f0f7632 wd_last: 3e425140 mask: ffffffff
> > clocksource:                       'tsc' cs_nsec: 511996079 cs_now:
> > 18b0866e6a cs_last: 185f8d68ba mask: ffffffffffffffff
> > clocksource:                       'tsc' is current clocksource.
> > tsc: Marking TSC unstable due to clocksource watchdog
> > TSC found unstable after boot, most likely due to broken BIOS. Use
> > 'tsc=unstable'.
> > sched_clock: Marking unstable (3765559657, 1276001)<-(3775071370, -8235646)
> > clocksource: Checking clocksource tsc synchronization from CPU 1 to CPUs 0.
> > clocksource: Switched to clocksource hpet
> >
> >
> > I already had such a warning with 5.10, but there was no >60sec freeze
> > with it like with 6.1

So, it sounds like your TSC stalls in idle (likely missing
X86_FEATURE_NONSTOP_TSC), and probably something between 5.10 and 6.1
added a sleep which causes the stall before the clocksource watchdog
can check and disable the TSC on its own.

The kernel is telling you tsc=unstable is the way to go here, and it
seems that is working for you.  From my first glance, I'd not call
this a regression, as the kernel was warning you about the problematic
hardware before, and it was most likely just luck that it was able to
auto-detect the problem before there were any negative results.

That said, if you're still curious, you might try bisecting kernel
versions between 5.10 and 6.1 to see which commit might have caused
the change in behavior.

thanks
-john

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ