[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230419193432.GA220432@bhelgaas>
Date: Wed, 19 Apr 2023 14:34:32 -0500
From: Bjorn Helgaas <helgaas@...nel.org>
To: Andy Shevchenko <andriy.shevchenko@...ux.intel.com>
Cc: Rob Herring <robh@...nel.org>,
Donald Hunter <donald.hunter@...il.com>,
"Rafael J. Wysocki" <rafael@...nel.org>,
linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org,
Bjorn Helgaas <bhelgaas@...gle.com>, netdev@...r.kernel.org,
Jesse Brandeburg <jesse.brandeburg@...el.com>,
Tony Nguyen <anthony.l.nguyen@...el.com>
Subject: Re: [BUG] net, pci: 6.3-rc1-4 hangs during boot on PowerEdge R620
with igb
On Wed, Apr 12, 2023 at 04:20:33PM +0300, Andy Shevchenko wrote:
> On Tue, Apr 11, 2023 at 02:02:03PM -0500, Rob Herring wrote:
> > On Tue, Apr 11, 2023 at 7:53 AM Donald Hunter <donald.hunter@...il.com> wrote:
> > > Bjorn Helgaas <helgaas@...nel.org> writes:
> > > > On Mon, Apr 10, 2023 at 04:10:54PM +0100, Donald Hunter wrote:
> > > >> On Sun, 2 Apr 2023 at 23:55, Bjorn Helgaas <helgaas@...nel.org> wrote:
> > > >> > On Sat, Apr 01, 2023 at 01:52:25PM +0100, Donald Hunter wrote:
> > > >> > > On Fri, 31 Mar 2023 at 20:42, Bjorn Helgaas <helgaas@...nel.org> wrote:
> > > >> > > >
> > > >> > > > I assume this igb NIC (07:00.0) must be built-in (not a plug-in card)
> > > >> > > > because it apparently has an ACPI firmware node, and there's something
> > > >> > > > we don't expect about its status?
> > > >> > >
> > > >> > > Yes they are built-in, to my knowledge.
> > > >> > >
> > > >> > > > Hopefully Rob will look at this. If I were looking, I would be
> > > >> > > > interested in acpidump to see what's in the DSDT.
> > > >> > >
> > > >> > > I can get an acpidump. Is there a preferred way to share the files, or just
> > > >> > > an email attachment?
> > > >> >
> > > >> > I think by default acpidump produces ASCII that can be directly
> > > >> > included in email. http://vger.kernel.org/majordomo-info.html says
> > > >> > 100K is the limit for vger mailing lists. Or you could open a report
> > > >> > at https://bugzilla.kernel.org and attach it there, maybe along with a
> > > >> > complete dmesg log and "sudo lspci -vv" output.
> > > >>
> > > >> Apologies for the delay, I was unable to access the machine while travelling.
> > > >>
> > > >> https://bugzilla.kernel.org/show_bug.cgi?id=217317
> > > >
> > > > Thanks for that! Can you boot a kernel with 6fffbc7ae137 reverted
> > > > with this in the kernel parameters:
> > > >
> > > > dyndbg="file drivers/acpi/* +p"
> > > >
> > > > and collect the entire dmesg log?
> > >
> > > Added to the bugzilla report.
> >
> > Rafael, Andy, Any ideas why fwnode_device_is_available() would return
> > false for a built-in PCI device with a ACPI device entry? The only
> > thing I see in the log is it looks like the parent PCI bridge/bus
> > doesn't have ACPI device entry (based on "[ 0.913389] pci_bus
> > 0000:07: No ACPI support"). For DT, if the parent doesn't have a node,
> > then the child can't. Not sure on ACPI.
>
> Thanks for the Cc'ing. I haven't checked anything yet, but from the above it
> sounds like a BIOS issue. If PCI has no ACPI companion tree, then why the heck
> one of the devices has the entry? I'm not even sure this is allowed by ACPI
> specification, but as I said, I just solely used the above mail.
ACPI r6.5, sec 6.3.7, about _STA says:
- Bit [0] - Set if the device is present.
- Bit [1] - Set if the device is enabled and decoding its resources.
- Bit [3] - Set if the device is functioning properly (cleared if
device failed its diagnostics).
...
If a device is present on an enumerable bus, then _STA must not
return 0. In that case, bit[0] must be set and if the status of the
device can be determined through a bus-specific enumeration and
discovery mechanism, it must be reflected by the values of bit[1]
and bit[3], even though the OSPM is not required to take them into
account.
Since PCI *is* an enumerable bus, I don't think we can use _STA to
decide whether a PCI device is present.
We can use _STA to decide whether a host bridge is present, of course,
but that doesn't help here because the host bridge in question is
PNP0A08:00 that leads to [bus 00-3d], and it is present.
I don't know exactly what path led to the igb issue, but I don't think
we need to figure that out. I think we just need to avoid the use of
_STA in fwnode_device_is_available().
6fffbc7ae137 ("PCI: Honor firmware's device disabled status") appeared
in v6.3-rc1, so I think we need to revert or fix it before v6.3, which
will probably be tagged Sunday (and I'll be on vacation
Friday-Monday).
Bjorn
Powered by blists - more mailing lists