lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1802131533160.1130@nanos.tec.linutronix.de>
Date:   Tue, 13 Feb 2018 15:39:02 +0100 (CET)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Tvrtko Ursulin <tursulin@...ulin.net>
cc:     Ingo Molnar <mingo@...hat.com>, x86@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: smpboot: do_boot_cpu failed(-1) to wakeup CPU#0

On Tue, 13 Feb 2018, Tvrtko Ursulin wrote:
> On 07/02/18 12:48, Tvrtko Ursulin wrote:
> > We are seeing failures to online the CPU0 on Apollo Lake in the form of:
> > 
> >   <6>[  126.508783] smpboot: CPU 0 is now offline
> >   <6>[  127.520746] smpboot: Booting Node 0 Processor 0 APIC 0x0
> >   <3>[  137.521036] smpboot: do_boot_cpu failed(-1) to wakeup CPU#0
> > 
> > I unfortunately cannot say with which kernel version this started since
> > we added a test which does this only recently. I also have no local
> > access to this machine. (It is part of a test farm for i915 driver
> > development testing.) But we recently added a test which off-lines, and
> > on-lines back, CPUs and started seeing this. Small reproducer looks like
> > this (without boilerplate):
>
> Any hints on how to debug this? Could it be firwmare? Try some boot options or
> something?

There are issues with CPU0 hotplug on commodity hardware. I have systems
where it does not work, but TBH I never bothered to investigate it. Some
years ago we had issues with suspend/resume when it was not running on
CPU0.  These were related to firmware assumptions about CPU0. So I wouldn't
be too surprised if there are general issues with unplugging CPU0.

CPU0 unplug is really only relevant for systems which support physical
hotplug, so testing it on commodity hardware does not have much
value. Testing in VMs for increasing the test coverage works well enough.

Thanks,

	tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ