[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y6PYYUKXR2OCH3WG@MiWiFi-R3L-srv>
Date: Thu, 22 Dec 2022 12:09:05 +0800
From: Baoquan He <bhe@...hat.com>
To: "Guilherme G. Piccoli" <gpiccoli@...lia.com>
Cc: x86@...nel.org, kexec@...ts.infradead.org,
Thomas Gleixner <tglx@...utronix.de>,
linux-kernel@...r.kernel.org
Subject: Re: kdump kernel randomly hang with tick_periodic call trace on bare
metal system
On 12/21/22 at 12:46pm, Guilherme G. Piccoli wrote:
> On 20/12/2022 02:51, Baoquan He wrote:
> > On 12/20/22 at 01:41pm, Baoquan He wrote:
> >> On one intel bare metal system, I can randomly reproduce the kdump hang
> >> as below with tick_periodic call trace. Attach the kernel config for
> >> reference.
> >
> > Forgot mentioning this random hang is also caused by adding
> > 'nr_cpus=2' into normal kernel's cmdline, then triggering crash will get
> > kdump kernel hang as below kdump log shown.
> >
>
> The weird thing is that you seem to be using "nr_cpus=1" instead - this
> is the cmdline from the log:
>
> "nr_cpus=2 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off
> numa=off udev.children-max=2 panic=10 acpi_no_memhotplug
> transparent_hugepage=never nokaslr hest_disable novmcoredd cma=0
> hugetlb_cma=0 disable_cpu_apicid=16 [...]"
>
> You seems to pass twice the "nr_cpus" thing, and I guess kernel pick the
> last one?
>From the kdump kernel boot log, yes, the nr_cpus=1 is taken. The
parse_early_param() will parse the kernel parameters one by one, then
the last one will take effect. Here, the problem is not at nr_cpus=2 or
1, the bare metal system has 16 cpus, only 2 cpus is present, it seems
to be the halted 14 cpus get wrong message and behave incorrectly to
cause the issue.
>
> Also, what is "disable_cpu_apicid=16"? Could this be related?
Not really. Please check disable_cpu_apicid in
Documentation/admin-guide/kdump/kdump.rst, it's bsp's apic id.
Powered by blists - more mailing lists