[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230807172852.GA250768@bhelgaas>
Date:   Mon, 7 Aug 2023 12:28:52 -0500
From:   Bjorn Helgaas <helgaas@...nel.org>
To:     Igor Mammedov <imammedo@...hat.com>
Cc:     linux-kernel@...r.kernel.org, terraluna977@...il.com,
        bhelgaas@...gle.com, linux-pci@...r.kernel.org, mst@...hat.com,
        rafael@...nel.org, linux-acpi@...r.kernel.org
Subject: Re: [PATCH 1/1] PCI: acpiphp:: use
 pci_assign_unassigned_bridge_resources() only if bus->self not NULL
On Mon, Aug 07, 2023 at 03:07:46PM +0200, Igor Mammedov wrote:
> On Fri, 4 Aug 2023 18:27:09 -0500
> Bjorn Helgaas <helgaas@...nel.org> wrote:
> 
> > On Wed, Jul 26, 2023 at 02:35:18PM +0200, Igor Mammedov wrote:
> > > Commit [1] switched acpiphp hotplug to use
> > >    pci_assign_unassigned_bridge_resources()
> > > which depends on bridge being available, however in some cases
> > > when acpiphp is in use, enable_slot() can get a slot without
> > > bridge associated.  
> > 
> > acpiphp is *always* in use if we get to enable_slot(), so that doesn't
> > really add information here.
> > 
> > >   1. legitimate case of hotplug on root bus
> > >       (likely not exiting on real hw, but widely used in virt world)
> > >   2. broken firmware, that sends 'Bus check' events to non
> > >      existing root ports (Dell Inspiron 7352/0W6WV0), which somehow
> > >      endup at acpiphp:enable_slot(..., bridge = 0) and with bus
> > >      without bridge assigned to it.  
> 
> how about following commit message (incorporating all feed back in this thread):
I incorporated this commit log and put the patch on for-linus for
v6.5.  I think the patch is fine, and we can amend the commit log
again if necessary.
> -- cut --
> Commit [1] switched acpiphp hotplug to use
>    pci_assign_unassigned_bridge_resources()
> which depends on bridge being available, however in case
> when acpiphp is enabled [2], enable_slot() can be called without
> bridge associated.
>   1. legitimate case of hotplug on root bus
>       (widely used in virt world)
>   2. a (misbehaving) firmware, that sends 'Bus check' events
>      to non existing root ports (Dell Inspiron 7352/0W6WV0),
>      which endup at acpiphp:enable_slot(..., bridge = 0)
>      where bus has not bridge assigned to it.
>      acpihp doesn't know that it's a bridge, and bus specific
>      'PCI subsystem' can't argument ACPI context with bridge
>      information since the PCI device to get this data from
>      is/was not available.
> 
> Issue is easy to reproduce with QEMU's 'pc' machine, which supports
> PCI hotplug on hostbridge slots. To reproduce boot kernel at
> commit [1] in VM started with following CLI (assuming guest root fs
> is installed on sda1 partition):
> 
>    # qemu-system-x86_64 -M pc -m 1G -enable-kvm -cpu host \
>          -monitor stdio -serial file:serial.log           \
>          -kernel arch/x86/boot/bzImage                    \
>          -append "root=/dev/sda1 console=ttyS0"           \
>          guest_disk.img
> 
> once guest OS is fully booted at qemu prompt:
> 
> (qemu) device_add e1000
> 
> (check serial.log) it will cause NULL pointer dereference at:
> 
>     void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge)
>     {
>         struct pci_bus *parent = bridge->subordinate;
> 
> [  612.277651] BUG: kernel NULL pointer dereference, address: 0000000000000018
> [...]
> [  612.277798]  ? pci_assign_unassigned_bridge_resources+0x1f/0x260
> [  612.277804]  ? pcibios_allocate_dev_resources+0x3c/0x2a0
> [  612.277809]  enable_slot+0x21f/0x3e0
> [  612.277816]  acpiphp_hotplug_notify+0x13d/0x260
> [  612.277822]  ? __pfx_acpiphp_hotplug_notify+0x10/0x10
> [  612.277827]  acpi_device_hotplug+0xbc/0x540
> [  612.277834]  acpi_hotplug_work_fn+0x15/0x20
> [  612.277839]  process_one_work+0x1f7/0x370
> [  612.277845]  worker_thread+0x45/0x3b0
> [  612.277850]  ? __pfx_worker_thread+0x10/0x10
> [  612.277854]  kthread+0xdc/0x110
> [  612.277860]  ? __pfx_kthread+0x10/0x10
> [  612.277866]  ret_from_fork+0x28/0x40
> [  612.277871]  ? __pfx_kthread+0x10/0x10
> [  612.277876]  ret_from_fork_asm+0x1b/0x30
> 
> The issue was discovered on Dell Inspiron 7352/0W6WV0 laptop with
> following sequence:
>    1. suspend to RAM
>    2. wake up with the same backtrace being observed:
>    3. 2nd suspend to RAM attempt makes laptop freeze
> 
> Fix it by using __pci_bus_assign_resources() instead of
> pci_assign_unassigned_bridge_resources() as we used to do
> but only in case when bus doesn't have a bridge associated
> (to cover for the case of ACPI event on hostbridge or
> non existing root port).
> 
> That let us keep hotplug on root bus working like it used to be
> and at the same time keeps resource reassignment usable on
> root ports (and other 1st level bridges) that was fixed by [1].
> 
> 1)
> Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")
> 2) CONFIG_HOTPLUG_PCI_ACPI=y
> 
> -- cut --
> 
> if commit message is looking acceptable to you, I can respin
> amended patch with your suggestions taken in account.
> 
> > IIUC, the Inspiron problem happens when:
> > 
> >   - acpiphp_context->bridge is NULL, so hotplug_event() calls
> >     enable_slot() instead of acpiphp_check_bridge(), AND
> > 
> >   - acpiphp_slot->bus->self is also NULL, because enable_slot() calls
> >     pci_assign_unassigned_bridge_resources() with that NULL pointer,
> >     which dereferences "bridge->subordinate"
> > 
> > But I can't figure out why acpiphp_context->bridge is NULL for RP07
> > and RP08 (which don't exist), but not for RP03 (which does).
> > 
> > I guess all the acpiphp_contexts (RP03, RP07, RP08) must be allocated in
> > acpiphp_add_context() by acpiphp_init_context().
> > 
> > Woody's lspci from [1] shows only one Root Port:
> > 
> >   00:1c.0 Wildcat Point-LP PCI Express Root Port #3
> > 
> > The DSDT.DSL includes:
> > 
> >   Device (RP01) _ADR 0x001C0000		# 1c.0
> >   Device (RP02) _ADR 0x001C0001		# 1c.1
> >   Device (RP03) _ADR 0x001C0002		# 1c.2
> >   Device (RP04) _ADR 0x001C0003		# 1c.3
> >   Device (RP05) _ADR 0x001C0004		# 1c.4
> >   Device (RP06) _ADR 0x001C0005		# 1c.5
> >   Device (RP07) _ADR 0x001C0006		# 1c.6
> >   Device (RP08) _ADR 0x001C0007		# 1c.7
> > 
> > I can see why we might need a Bus Check after resume to see if
> > something got added while we were suspended.  But I don't see why we
> > handle RP03 differently from RP07 and RP08.
> > 
> > Can you help me out?  I'm lost in a maze of twisty passages, all
> > alike.
> 
> I'll try to trace call path and see where it leads
> (based on a guess in updated commit message: OS/nor ACPI
> has info if it's bridge when the device didn't exists
> during boot).
> 
> (though, I don't think it should hold this patch.
> while it would be good to understand where bridge
> gets added in acpi context, it's not directly relevant
> to the fixing hotplug on hostbridge and buscheck events 
> on non-existing root-ports)
> 
> > Bjorn
> > 
> > [1] https://lore.kernel.org/r/92150d8d-8a3a-d600-a996-f60a8e4c876c@gmail.com
> > 
> 
> 
Powered by blists - more mailing lists
 
