lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 8 Mar 2018 13:11:56 -0500
From:   Brian Rak <brak@...eservers.com>
To:     Randy Dunlap <rdunlap@...radead.org>, linux-kernel@...r.kernel.org,
        linux-acpi@...r.kernel.org
Subject: Re: Hang while booting 4.15.7



On 3/8/2018 1:02 PM, Brian Rak wrote:
>
>
> On 3/8/2018 12:49 PM, Randy Dunlap wrote:
>> On 03/08/2018 08:21 AM, Brian Rak wrote:
>>> We have some Dell servers running Intel Gold 6126 processors. Some 
>>> of them hang on boot under 4.15.7, but work fine on 4.14.14.  When 
>>> they hang, we see the following on console:
>>>
>>> Error parsing PCC subspaces from PCCT
>>> watchdog: BUG: soft lockup - CPU #16 stuck for 23s! [swapper/0:1]
>>>
>>> We see that PCC subspaces error under 4.14 as well, but it doesn't 
>>> cause the machine to hang.
>>>
>>> So far we haven't been able to correlate these hangs with anything 
>>> in particular.  Some machines will hang, some machines will boot.  
>>> They're otherwise identical as far as hardware and firmware goes.
>>>
>>> I've tried pcie_aspm=off, since that seems to be the next bit of 
>>> code that's being executed.  This resulted in the machine booting a 
>>> little further, but then oopsing somewhere in acpi_os_purge_cache. 
>>> I'm not able to get a full trace there, as I don't have serial 
>>> access easily available.
>>>
>>> Any suggestions?
>>>
>> Hi,
>>
>> The first thing that I would do is boot with:
>>    ignore_loglevel initcall_debug
>> on the kernel boot command line.
>>
>> That will add lots of messages and maybe give us a stronger hint 
>> about where
>> the hang is actually happening.
>>
>> And then worst case (without a boot log via serial console or 
>> netconsole) is
>> to take a photo of the screen with the oops messages.
>>
>> And if you are fairly certain that it's an ACPI issue, also write to the
>> linux-acpi@...r.kernel.org mailing list.
>>
> Thanks!
>
> I booted with those parameters, and this certainly seems like an ACPI 
> issue.  During bootup, the machine paused here for about 20s:
> https://www.dropbox.com/s/39us0tlhbzuay7t/2018-03-08%2012_52_35.png?dl=0
>
> then it started printing this trace:
> https://www.dropbox.com/s/nxdhm19wcitrgm0/2018-03-08%2012_53_04.png?dl=0
>
> (I can't see to figure out a decent way to capture the beginning of 
> the trace here, I'll have to see if I can get the serial console working)
I got the serial console working, that's just the end of the CPU stuck 
message:

https://gist.githubusercontent.com/devicenull/abe9022877d0a7354fa2ffc8b8a8f042/raw/e497624f90037eb272760f7c5c3d2a0f21e5ea83/gistfile1.txt

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ