lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20161125140356.GB11256@mobilestation>
Date:   Fri, 25 Nov 2016 17:03:56 +0300
From:   Serge Semin <fancer.lancer@...il.com>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     bhelgaas@...gle.com, shawn.lin@...k-chips.com, luto@...nel.org,
        Sergey.Semin@...latforms.ru, linux-pci@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC] PCI: Fix kernel panic of root-port-less PCIe enum due to
 ASPM

On Tue, Nov 08, 2016 at 05:29:57PM -0600, Bjorn Helgaas <helgaas@...nel.org> wrote:

Hello Bjorn,
Here are the answers on your questions inlined in the text.

> Hi Serge,
> 
> On Thu, Oct 06, 2016 at 12:34:15PM +0300, Serge Semin wrote:
> > Hello linux folks,
> > 
> >     Sometime ago I discovered a kernel panic popping up when PCI subsystem was
> > trying to enumerate PCI express bus with ASPM service enabled. Here it is:
> > 
> > [    5.089667] CPU 0 Unable to handle kernel paging request at virtual
> > address 00000060, epc == 80317004, ra == 80316ac8
> > [    5.120952] Oops[#1]:
> >           ...
> > [    5.528438] Call Trace:
> > [    5.535640] [<80317004>] pcie_aspm_init_link_state+0x6c0/0x814
> > [    5.552843] [<80300c44>] pci_scan_slot+0x140/0x148
> > [    5.566957] [<80301dcc>] pci_scan_child_bus+0x50/0x1b0
> > [    5.582096] [<80301944>] pci_scan_bridge+0x25c/0x694
> > [    5.596724] [<80301e78>] pci_scan_child_bus+0xfc/0x1b0
> > [    5.611862] [<80301944>] pci_scan_bridge+0x25c/0x694
> > [    5.626488] [<80301e78>] pci_scan_child_bus+0xfc/0x1b0
> > [    5.641628] [<8030215c>] pci_scan_root_bus+0x64/0x124
> > [    5.656528] [<804ca298>] pcibios_scanbus+0xa8/0x188
> > 
> >     I more than sure you are familiar with the issue, since I've found the
> > mailing discussion: "PCI: avoid NULL deref in alloc_pcie_link_state"
> > https://patchwork.kernel.org/patch/2751651/
> > https://bugzilla.kernel.org/show_bug.cgi?id=60111
> 
> I'm trying to puzzle out a few things here.  Maybe you can help me out?
> 
> - Does this issue exist in current upstream kernels?  Your dmesg shows a
>   v3.19-based kernel.  c8fc9339409d ("PCI/ASPM: Use dev->has_secondary_link
>   to find downstream links"), which appeared in v4.2, fixes a problem very
>   similar to what you're reporting.
> 

I saw that fix, but alas it hasn't fixed the issue. I've tested kernel 4.4.24
without my patch applied and the problem with ASPM-related kernel panic still
exists (see the stack-trace above).

> - When we dereference the NULL pointer, which device did we call
>   pcie_aspm_init_link_state() for?
> 

My suggestion was that the problem arised in the framework of bus 2 enumeration.
Since there was no root bus on my architecture, the pci_link_state structure was
not created. So when the algorithm tried to enumerate the second bus, it needed
actual pci_link_state structure of parental bus, which hadn't been created.
That's how the NULL-dereference happened.

> - https://bugzilla.kernel.org/attachment.cgi?id=240981 is the failing dmesg
>   log, and it shows "vgaarb: device added: PCI:0000:04:00.0".
>   
>   Your lspci output (https://bugzilla.kernel.org/attachment.cgi?id=241001)
>   shows 04:00.0 is a downstream port, but vga_arbiter_add_pci_device() only
>   prints that message for VGA class devices.
> 
>   https://bugzilla.kernel.org/attachment.cgi?id=240991, the successful
>   dmesg log, shows "vgaarb: device added: PCI:0000:06:00.0".  That makes
>   more sense because 06:00.0 is class 0300, which is a VGA device.
> 
> Bjorn

I can't be sure about the reason of that strange enumeration. But I can assure
you, that that bus confusion isn't the reason of the ASPM panicing. So I can
just guess, that the misleading BDF can be caused by SMP (I've got a processor
with two cores) and ASPM panic. VGA driver initialization may happen
concurrently with PCI bus enumeration.

Regards,
-Sergey

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ