lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZjkxTGaAc48jPzqC@wunner.de>
Date: Mon, 6 May 2024 21:36:44 +0200
From: Lukas Wunner <lukas@...ner.de>
To: Nam Cao <namcao@...utronix.de>
Cc: Bjorn Helgaas <bhelgaas@...gle.com>, Yinghai Lu <yinghai@...nel.org>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
	Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
	Mika Westerberg <mika.westerberg@...ux.intel.com>
Subject: Re: [PATCH v2 2/2] PCI: pciehp: Abort hot-plug if
 pci_hp_add_bridge() fails

On Mon, May 06, 2024 at 10:37:01AM +0200, Nam Cao wrote:
> I just discovered a new crash scenario with shpchp:
> 
> First, hot-add a bridge:
> (qemu) device_add pci-bridge,id=br2,bus=br1,chassis_nr=1,addr=1
> 
> But with the current patch, the hot-plug is still successful, just that the
> bridge is not properly configured:
> $ lspci -t
> -[0000:00]-+-00.0
>            +-01.0
>            +-02.0
>            +-03.0-[01-02]----00.0-[02]----01.0--  <-- new hot-added bridge
>            +-1f.0
>            +-1f.2
>            \-1f.3
> 
> But! Now I leave the hot-added bridge as is, and hot-add an ethernet card:
> (qemu) device_add e1000,bus=br1,id=eth0,addr=2
> 
> this command crashes the kernel (full crash log below).
> 
> The problem is because during hot-plugging of this ethernet card,
> pci_bus_add_devices() is invoked and the previously hot-plugged bridge hasn't
> been marked as "added" yet, so the driver attempts to add this bridge
> again, leading to the crash.
> 
> Now for pciehp, this scenario cannot happen, because there is only a single
> slot, so we cannot hot-plug another device. But still, this makes me think
> perhaps we shouldn't leave the hot-plugged bridge in a improperly-configured
> state as this patch is doing. I cannot think of a scenario that would crash pciehp
> similarly to shpchp. But that's just me, maybe there is a rare scenario
> that can crash pciehp, but I just haven't seen yet.
> 
> My point is: better safe than sorry. I propose going back to the original
> solution: calling pci_stop_and_remove_bus_device() and return a negative
> error, and get rid of this improperly configured bridge. It is useless
> anyways without a subordinate bus number.
> 
> What do you think?

When the kernel runs out of BAR address space for devices downstream of
a bridge, it doesn't de-enumerate the bridge.  The devices are just
unusable.

So my first intuition is that running out of bus numbers shouldn't result
in de-enumeration either.

Remind me, how exactly does the NULL pointer deref occur?  I think it's
because no struct pci_bus was allocated for the subordinate bus of the
hot-plugged bridge, right?  Because AFAICS that would happen in

pci_hp_add_bridge()
  pci_can_bridge_extend()
    pci_add_new_bus()
      pci_alloc_child_bus()

but we never get that far because pci_hp_add_bridge() bails out with -1.
So the subordinate pointer remains a NULL pointer.

Perhaps we should allocate an empty struct pci_bus instead.
Or check for a NULL subordinate pointer instead of crashing.

I can't really say off the top of my head which solution is appropriate
here, it requires going through the code paths and identifying the
complexity associated with each solution.

Thanks,

Lukas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ