linux-kernel - Re: Unstable tsc caused soft lockup in kdump kernel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <YtDD2WHLiFbceXuE@MiWiFi-R3L-srv>
Date:   Fri, 15 Jul 2022 09:33:13 +0800
From:   Baoquan He <bhe@...hat.com>
To:     "Guilherme G. Piccoli" <gpiccoli@...lia.com>
Cc:     jstultz@...gle.com, tglx@...utronix.de, sboyd@...nel.org,
        linux-kernel@...r.kernel.org, x86@...nel.org,
        kexec@...ts.infradead.org
Subject: Re: Unstable tsc caused soft lockup in kdump kernel

On 07/14/22 at 04:34pm, Guilherme G. Piccoli wrote:
> On 29/06/2022 07:25, Baoquan He wrote:
> > Hi,
> > 
> > On a HP machine, after crash triggered via sysrq-c, kdump kernel will
> > boot and get soft lockup as below. And this can be always reproduced.
> > 
> > From log, it seems that unreliable tsc was marked as unstable in
> > clocksource_watchdog, then worker sched_clock_work was scheduled. And
> > this tsc unstable marking always happened after sysrq-c is triggered.
> > And the cpu where worker smp_call_function_many_cond is running won't
> > be stopped and hang there and keep locks, even though the cpu should be
> > stopped. While kdump kernel is running in a different cpu and boot, then
> > soft lockup happened because other workers or the relevant threads are
> > waiting for locks taken by the hang sched_clock_work worker.
> > 
> > Any idea or suggestion?
> > 
> > The normal kernel boot log and kdump kernel log, kernel config, are all
> > attached, please check.
> > 
> 
> Hi Baoquan, interesting issue! Do you happen to have a full kdump boot
> log with the issue? Maybe collected through serial console, etc.
> It seems the one attached is from a succeeding kdump by passing
> "tsc=unstable" to the kdump kernel right?

The attached kdump boot log is the one in which kdump kernel is hang.
The 'tsc=unstable' need be added to 1st kernel to work around it. Only
adding 'tsc=unstable' into kdump kernel doesn't change anything since
the clocksouce_watchdog work has been in a loop because of the unstable
tsc in 1st kernel.

> 
> Also, did you try to "forbid" tsc to get marked as unstable in the first
> kernel, before kdump? I mean like a hack code change, just prevent
> kernel doing that and seeing if it works. If that still fails, then it
> seems the cause of the issue is the same as the cause of TSC getting
> unstable - in other words, something would be causing both the kdump
> kernel lockup AND the TSC unstable marking in the first kernel...

As I added later that adding 'tsc=unstable' into 1st kernel's cmdline can work
around the issue. kdump works well with that.