lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fb486edb-2da9-6ef3-8eb4-59c725995689@ursulin.net>
Date:   Tue, 13 Feb 2018 14:51:57 +0000
From:   Tvrtko Ursulin <tursulin@...ulin.net>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Ingo Molnar <mingo@...hat.com>, x86@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: smpboot: do_boot_cpu failed(-1) to wakeup CPU#0


Hi,

On 13/02/18 14:39, Thomas Gleixner wrote:
> On Tue, 13 Feb 2018, Tvrtko Ursulin wrote:
>> On 07/02/18 12:48, Tvrtko Ursulin wrote:
>>> We are seeing failures to online the CPU0 on Apollo Lake in the form of:
>>>
>>>    <6>[  126.508783] smpboot: CPU 0 is now offline
>>>    <6>[  127.520746] smpboot: Booting Node 0 Processor 0 APIC 0x0
>>>    <3>[  137.521036] smpboot: do_boot_cpu failed(-1) to wakeup CPU#0
>>>
>>> I unfortunately cannot say with which kernel version this started since
>>> we added a test which does this only recently. I also have no local
>>> access to this machine. (It is part of a test farm for i915 driver
>>> development testing.) But we recently added a test which off-lines, and
>>> on-lines back, CPUs and started seeing this. Small reproducer looks like
>>> this (without boilerplate):
>>
>> Any hints on how to debug this? Could it be firwmare? Try some boot options or
>> something?
> 
> There are issues with CPU0 hotplug on commodity hardware. I have systems
> where it does not work, but TBH I never bothered to investigate it. Some
> years ago we had issues with suspend/resume when it was not running on
> CPU0.  These were related to firmware assumptions about CPU0. So I wouldn't
> be too surprised if there are general issues with unplugging CPU0.
> 
> CPU0 unplug is really only relevant for systems which support physical
> hotplug, so testing it on commodity hardware does not have much
> value. Testing in VMs for increasing the test coverage works well enough.

Thanks, that explains it.

We actually use CPU hotplug just to test if the PMU event migration and 
accounting works as expected in i915 PMU. And since, luckily, the issue 
with CPU0 hotplug manifests only on one of the test systems, I think we 
will just skip this test on that machine.

Thanks again!

Tvrtko

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ