[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c1d1b79c-bb2e-4a69-888d-a3301bcbfeb2@yahoo.fr>
Date: Fri, 3 Jan 2025 16:38:24 +0100
From: Fab Stz <fabstz-it@...oo.fr>
To: John Stultz <jstultz@...gle.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Daniel Lezcano <daniel.lezcano@...aro.org>,
Anna-Maria Behnsen <anna-maria@...utronix.de>,
Frederic Weisbecker <frederic@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [REGRESSION] ? system is stuck in clocksource, >60s delay at boot
time without tsc=unstable
Hello John,
Le 02/01/2025 à 22:56, John Stultz a écrit :
> On Thu, Jan 2, 2025 at 1:49 PM John Stultz <jstultz@...gle.com> wrote:
>>
>> On Fri, Dec 27, 2024 at 4:39 AM Fab Stz <fabstz-it@...oo.fr> wrote:
>>>
>>> Hello,
>>>
>>> It's been one month now that I sent this email. Do you have any clue on this?
>>
>> Apologies you didn't get a quick response, but you didn't really cc
>> many people on the first one.
No problem. I thought it was better not to put too many people in copy
in the first message given that it was also sent to the mailing list.
>>> Le mercredi 27 novembre 2024, 08:18:41 CET Fab Stz a écrit :
>>>> Hi,
>>>>
>>>> While upgrading from Debian bullseye (kernel 5.10) to bookworm (6.1) I
>>>> noticed that the newer kernel is at the beginning of the boot stuck for
>>>> more than 60 seconds.
>>>>
>>>> This is apparently related to the clocksource module. If I boot with
>>>> tsc=unstable there is no more delay.
>>>>
>>>> In the kernel logs, I have:
>>>>
>>>> clocksource: Long readout interval, skipping watchdog check: cs_nsec:
>>>> 512010551 wd_nsec: 39243763320
>>>> clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as
>>>> unstable because the skew is too large:
>>>> clocksource: 'hpet' wd_nsec: 537773520 wd_now:
>>>> 3f0f7632 wd_last: 3e425140 mask: ffffffff
>>>> clocksource: 'tsc' cs_nsec: 511996079 cs_now:
>>>> 18b0866e6a cs_last: 185f8d68ba mask: ffffffffffffffff
>>>> clocksource: 'tsc' is current clocksource.
>>>> tsc: Marking TSC unstable due to clocksource watchdog
>>>> TSC found unstable after boot, most likely due to broken BIOS. Use
>>>> 'tsc=unstable'.
>>>> sched_clock: Marking unstable (3765559657, 1276001)<-(3775071370, -8235646)
>>>> clocksource: Checking clocksource tsc synchronization from CPU 1 to CPUs 0.
>>>> clocksource: Switched to clocksource hpet
>>>>
>>>>
>>>> I already had such a warning with 5.10, but there was no >60sec freeze
>>>> with it like with 6.1
>>
>> So, it sounds like your TSC stalls in idle (likely missing
>> X86_FEATURE_NONSTOP_TSC), and probably something between 5.10 and 6.1
>> added a sleep which causes the stall before the clocksource watchdog
>> can check and disable the TSC on its own.
>>
>> The kernel is telling you tsc=unstable is the way to go here, and it
>> seems that is working for you. From my first glance, I'd not call
>> this a regression, as the kernel was warning you about the problematic
>> hardware before, and it was most likely just luck that it was able to
>> auto-detect the problem before there were any negative results.
>
> Debian even suggests this for the iMac9,1 hardware you're using:
> https://wiki.debian.org/InstallingDebianOn/Apple/iMac/9-1#Boot_on_installer
>
> And highlights the exact behavior you describe (maybe this is your efforts?):
> https://wiki.debian.org/InstallingDebianOn/Apple/iMac/9-1#Kernel_configuration
I'm the author of that page on the debian wiki, indeed.
My findings are as follows:
* No delay with the following kernel versions shipped by debian (when
run on up-to-date bookworm as of today)
5.10.226, 5.19.11, 6.0.10, 6.1.4, 6.1.27, 6.1.38, 6.1.66, 6.1.76, 6.1.82
* Delay with the following kernel versions:
5.15.15, 6.1.85, 6.1.119
So something probably happened between 6.1.82 & 6.1.85 (debian doesn't
ship packages for versions between them). Why 5.15.15 also has a delay
is not clear.
For the versions where there is a delay, the warning from clocksource
mentioning an unstable clock always comes after the first line that
mentions USB "ACPI: bus type USB registered".
For the versions which don't have a boot delay, the warning from
clocksource mentioning an unstable clock always comes before the first
line that mentions USB "ACPI: bus type USB registered".
However, with 6.1.82, sometimes the unstable clocksource message comes
after the USB line, but when this happens, both messages are very close
in time (less than 50ms?) so that the subsequent usb messages always
appear after the clocksource message. So the return from the clocksource
might be early enough to not encounter the lock.
Actually, the lock is usually bit later than the "ACPI: bus type USB
registered", and the message at the time of the lock is related to USB.
Moreover, whether there is a boot delay or not:
- the line "ACPI: bus type USB registered" always comes after "Run /init
as init process"
- the warning from clocksource mentioning an unstable clock may or may
not be after "Run /init as init process"
Could it be that USB should not be registered/loaded before it was
determined whether clocksource is unstable or not?
Regards
Fab
Powered by blists - more mailing lists