[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <473a34a9-c844-6fe3-08cd-f7c5bf3f6c34@gameservers.com>
Date: Thu, 8 Mar 2018 13:11:56 -0500
From: Brian Rak <brak@...eservers.com>
To: Randy Dunlap <rdunlap@...radead.org>, linux-kernel@...r.kernel.org,
linux-acpi@...r.kernel.org
Subject: Re: Hang while booting 4.15.7
On 3/8/2018 1:02 PM, Brian Rak wrote:
>
>
> On 3/8/2018 12:49 PM, Randy Dunlap wrote:
>> On 03/08/2018 08:21 AM, Brian Rak wrote:
>>> We have some Dell servers running Intel Gold 6126 processors. Some
>>> of them hang on boot under 4.15.7, but work fine on 4.14.14. When
>>> they hang, we see the following on console:
>>>
>>> Error parsing PCC subspaces from PCCT
>>> watchdog: BUG: soft lockup - CPU #16 stuck for 23s! [swapper/0:1]
>>>
>>> We see that PCC subspaces error under 4.14 as well, but it doesn't
>>> cause the machine to hang.
>>>
>>> So far we haven't been able to correlate these hangs with anything
>>> in particular. Some machines will hang, some machines will boot.
>>> They're otherwise identical as far as hardware and firmware goes.
>>>
>>> I've tried pcie_aspm=off, since that seems to be the next bit of
>>> code that's being executed. This resulted in the machine booting a
>>> little further, but then oopsing somewhere in acpi_os_purge_cache.
>>> I'm not able to get a full trace there, as I don't have serial
>>> access easily available.
>>>
>>> Any suggestions?
>>>
>> Hi,
>>
>> The first thing that I would do is boot with:
>> ignore_loglevel initcall_debug
>> on the kernel boot command line.
>>
>> That will add lots of messages and maybe give us a stronger hint
>> about where
>> the hang is actually happening.
>>
>> And then worst case (without a boot log via serial console or
>> netconsole) is
>> to take a photo of the screen with the oops messages.
>>
>> And if you are fairly certain that it's an ACPI issue, also write to the
>> linux-acpi@...r.kernel.org mailing list.
>>
> Thanks!
>
> I booted with those parameters, and this certainly seems like an ACPI
> issue. During bootup, the machine paused here for about 20s:
> https://www.dropbox.com/s/39us0tlhbzuay7t/2018-03-08%2012_52_35.png?dl=0
>
> then it started printing this trace:
> https://www.dropbox.com/s/nxdhm19wcitrgm0/2018-03-08%2012_53_04.png?dl=0
>
> (I can't see to figure out a decent way to capture the beginning of
> the trace here, I'll have to see if I can get the serial console working)
I got the serial console working, that's just the end of the CPU stuck
message:
https://gist.githubusercontent.com/devicenull/abe9022877d0a7354fa2ffc8b8a8f042/raw/e497624f90037eb272760f7c5c3d2a0f21e5ea83/gistfile1.txt
Powered by blists - more mailing lists