[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zzsvphcxy-70Uz7E@smile.fi.intel.com>
Date: Mon, 18 Nov 2024 14:14:30 +0200
From: Andy Shevchenko <andriy.shevchenko@...ux.intel.com>
To: Hans de Goede <hdegoede@...hat.com>
Cc: Shinichiro Kawasaki <shinichiro.kawasaki@....com>,
"Daniel Walker (danielwa)" <danielwa@...co.com>,
Ilpo J�rvinen <ilpo.jarvinen@...ux.intel.com>,
Klara Modin <klarasmodin@...il.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Danil Rybakov <danilrybakov249@...il.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"xe-linux-external(mailer list)" <xe-linux-external@...co.com>
Subject: Re: platform/x86: p2sb: Allow p2sb_bar() calls during PCI device
probe
On Mon, Nov 18, 2024 at 12:42:18PM +0100, Hans de Goede wrote:
> On 18-Nov-24 12:30 PM, Shinichiro Kawasaki wrote:
> > On Nov 15, 2024 / 14:57, Daniel Walker (danielwa) wrote:
> >> On Fri, Nov 15, 2024 at 11:35:46AM +0000, Shinichiro Kawasaki wrote:
> >>> On Nov 13, 2024 / 19:34, Hans de Goede wrote:
> >>>> On 13-Nov-24 6:41 PM, Daniel Walker (danielwa) wrote:
> >>>>> On Wed, Nov 13, 2024 at 06:04:44PM +0100, Hans de Goede wrote:
> >>>>>> On 13-Nov-24 5:33 PM, Hans de Goede wrote:
> >>>>>>> On 13-Nov-24 5:24 PM, Hans de Goede wrote:
[...]
> >>>>>>> It probably has something to do with these 2 messages:
> >>>>>>>
> >>>>>>> pci 0000:00:1f.1: BAR 0 [mem 0xfd000000-0xfdffffff 64bit]: can't claim; no compatible bridge window
> >>>>>>> pci 0000:00:1f.1: BAR 0 [mem 0x280000000-0x280ffffff 64bit]: assigned
> >>>>>>>
> >>>>>>> I'm guessing that this re-assignment is messing up
> >>>>>>> the p2sb BAR caching, after which things go wrong.
> >>>>>>
> >>>>>> Hmm, but that should be fixed by 2c6370e66076 ("platform/x86: p2sb: Don't init until unassigned resources have been assigned")
> >>>>>>
> >>>>>> and you are seeing this with 6.12, which has that.
> >>>>>>
> >>>>>> Can you try adding a pr_info() to the top of p2sb_cache_resources()
> >>>>>> with 6.12 and then collec a new dmesg ?
> >>>>>>
> >>>>>> If that pr_info() is done after the:
> >>>>>>
> >>>>>> pci 0000:00:1f.1: BAR 0 [mem 0x280000000-0x280ffffff 64bit]: assigned
> >>>>>>
> >>>>>> message then that does not explain things.
> >>>>>
> >>>>> I haven't testing adding a pr_info() but the messages seem to happen in the same
> >>>>> order in both working and non-working cases.
> >>>>>
> >>>>> Does that matter?
> >>>>
> >>>> The working case does not do the bar caching, we want to know if the
> >>>> bar caching in the non working case happens before or after the assignment:
> >>>>
> >>>> pci 0000:00:1f.1: BAR 0 [mem 0x280000000-0x280ffffff 64bit]: assigned
> >>>>
> >>>> It should happen after the assignment.
> >>>
> >>> Hello Daniel,
> >>>
> >>> It's my sorrow that the change cause this trouble. I have created a debug patch
> >>> for the kernel and attached to this e-mail. It adds some pr_info() to answer
> >>> the question from Hans. It will also show us a bit more things. Could you try it
> >>> on your system? It should apply to v6.12-rcX kernels without conflicts.
> >>
> >> Ok.. The dmesg with the patch applied is attached.
> >
> > Thank you. Here I quote the relevant part of the debug log.
> >
> > --------------------------------------------------------------------------------
> > ...
> > pci 0000:00:1f.0: [8086:19dc] type 00 class 0x060100 conventional PCI endpoint
> > pci 0000:00:1f.1: [8086:19dd] type 00 class 0x058000 conventional PCI endpoint ... [A]
> > pci 0000:00:1f.1: BAR 0 [mem 0xfd000000-0xfdffffff 64bit]
> > pci 0000:00:1f.2: [8086:19de] type 00 class 0x058000 conventional PCI endpoint
> > pci 0000:00:1f.2: BAR 0 [mem 0x88c00000-0x88c03fff]
> > ...
> > PCI: Using ACPI for IRQ routing
> > pci 0000:00:1f.1: BAR 0 [mem 0xfd000000-0xfdffffff 64bit]: can't claim; no compatible bridge window ... [B]
> > hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0, 0, 0, 0, 0
> > ...
> > NET: Registered PF_XDP protocol family
> > pci 0000:00:1f.1: BAR 0 [mem 0x280000000-0x280ffffff 64bit]: assigned ... [C]
> > pci 0000:00:09.0: PCI bridge to [bus 01-06]
> > ...
> > PCI: CLS 64 bytes, default 64
> > p2sb_cache_resources
> > p2sb_cache_resources: P2SBC_HIDE=0 ... [D]
> > p2sb_scan_and_cache_devfn: devfn=1f.1
> > p2sb_scan_and_cache_devfn: 280000000-280ffffff: 140204 ... [E]
> > PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
> > ...
> > --------------------------------------------------------------------------------
> >
> > Also, here I list my observations.
> >
> > [A] The P2SB device was detected with DEVFN 1f.1, and device id 8086:19dd
> > [B] Failed to claim its resource
> > [C] Assigned new resource
> > [D] p2sb_cache_resource() was called after the new resource assignment.
> > P2SBC_HIDE bit is not set.
> > [E] The new resource was cached. IORESOURCE flags: MEM_64,SIZE_ALIGN,MEM.
> >
> > So it was confirmed that the p2sb_cache_resource() was called after the new
> > resource assignment, but Hans and Andy discuss that this order is not the
> > problem cause, probably.
> >
> > One thing I noticed is that p2sb_bar() call is not recorded in the log. My
> > understanding is that all device drivers which use P2SB resource shouled call
> > p2sb_bar(). Daniel, you noted that "a custom gpio device" disappeared. Does its
> > device driver call p2sb_bar()?
FYI: if we are talking about on-die GPIO (that's usually covered by
the drivers/intel/pinctrl-denverton.c), the new kernels have the piece
of the code to enumerate it via p2sb (the code is located in the
drivers/mfd/lpc_ich.c).
> > On the other hand, Daniel noted that,
> >
> > "The vendor and device details for the pci device are 8086:19dd."
> >
> > I think 8086:19dd is the P2SB device itself. When p2sb_cache_resource() is
> > called, pci_stop_and_remove_bus_device() is called for it, so I guess it is
> > expected the device 8086:19dd disappears. Before applying the commit
> > 5913320eb0b3, this pci_stop_and_remove_bus_device() call happened when
> > p2sb_bar() was called. So, my mere guess is that Daniel's system's drivers do
> > not call p2sb_bar() during the boot process, then the 8086:19dd P2SB device was
> > still visible after boot.
>
> Thank you that is a great analysis.
+1 here, indeed a great analysis!
> It sounds to me like calling pci_stop_and_remove_bus_device() on an unhidden
> PCI device is the real problem here. And that we were not hitting that before
> was more or less luck.
>
> I wonder if we can cache the bar in some other way, or even delay retrieving
> it till p2sb_bar() call time in the case when it is not hidden ?
--
With Best Regards,
Andy Shevchenko
Powered by blists - more mailing lists