[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1ee94000-14af-3edf-10b6-acd821075d3e@linux.intel.com>
Date: Thu, 1 Feb 2024 16:47:14 +0200 (EET)
From: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
To: Jonathan Woithe <jwoithe@...t42.net>
cc: Igor Mammedov <imammedo@...hat.com>,
Andy Shevchenko <andriy.shevchenko@...el.com>, linux-pci@...r.kernel.org,
Bjorn Helgaas <bhelgaas@...gle.com>,
Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
Rob Herring <robh@...nel.org>,
Krzysztof Wilczyński <kw@...ux.com>,
Lukas Wunner <lukas@...ner.de>,
Mika Westerberg <mika.westerberg@...ux.intel.com>,
"Rafael J . Wysocki" <rafael@...nel.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 0/7] PCI: Solve two bridge window sizing issues
On Thu, 1 Feb 2024, Jonathan Woithe wrote:
> On Mon, Jan 22, 2024 at 02:45:20PM +0100, Igor Mammedov wrote:
> > On Mon, 22 Jan 2024 14:37:32 +0200 (EET)
> > Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com> wrote:
> >
> > > On Mon, 22 Jan 2024, Jonathan Woithe wrote:
> > >
> > > > On Sun, Jan 21, 2024 at 02:54:22PM +0200, Andy Shevchenko wrote:
> > > > > On Thu, Jan 18, 2024 at 05:18:45PM +1030, Jonathan Woithe wrote:
> > > > > > On Thu, Jan 11, 2024 at 06:30:22PM +1030, Jonathan Woithe wrote:
> > > > > > > On Thu, Jan 04, 2024 at 10:48:53PM +1030, Jonathan Woithe wrote:
> > > > > > > > On Thu, Jan 04, 2024 at 01:12:10PM +0100, Igor Mammedov wrote:
> > > > > > > > > On Thu, 28 Dec 2023 18:57:00 +0200
> > > > > > > > > Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com> wrote:
> > > > > > > > >
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > Here's a series that contains two fixes to PCI bridge window sizing
> > > > > > > > > > algorithm. Together, they should enable remove & rescan cycle to work
> > > > > > > > > > for a PCI bus that has PCI devices with optional resources and/or
> > > > > > > > > > disparity in BAR sizes.
> > > > > > > > > >
> > > > > > > > > > For the second fix, I chose to expose find_empty_resource_slot() from
> > > > > > > > > > kernel/resource.c because it should increase accuracy of the cannot-fit
> > > > > > > > > > decision (currently that function is called find_resource()). In order
> > > > > > > > > > to do that sensibly, a few improvements seemed in order to make its
> > > > > > > > > > interface and name of the function sane before exposing it. Thus, the
> > > > > > > > > > few extra patches on resource side.
> > > > > > > > > >
> > > > > > > > > > Unfortunately I don't have a reason to suspect these would help with
> > > > > > > > > > the issues related to the currently ongoing resource regression
> > > > > > > > > > thread [1].
> > > > Thanks, and understood. In this case the request from Igor was
> > > >
> > > > can you test this series on affected machine with broken kernel to see if
> > > > it's of any help in your case?
> > > >
> > > > The latest vanilla kernel (6.7) has (AFAIK) had the offending commit
> > > > reverted, so it's not a "broken" kernel in this respect. Therefore, if I've
> > > > understood the request correctly, working with that kernel won't produce the
> > > > desired test.
> > >
> > > Well, you can revert the revert again to get back to the broken state.
> >
> > either this or just a hand patching as Ilpo has suggested earlier
> > would do.
>
> No problem. This was the easiest approach for me and I have now done this.
> Apologies for the delay in getting to this: I ran out of time last Thursday.
>
> > There is non zero chance that this series might fix issues
> > Jonathan is facing. i.e. failed resource reallocation which
> > offending patches trigger.
>
> I can confirm that as expected, this patch series has had no effect on the
> system which experiences the failed resource reallocation. From syslog,
> running a 5.15.141+ kernel[1]:
>
> kernel: radeon 0000:4b:00.0: Fatal error during GPU init
> kernel: radeon: probe of 0000:4b:00.0 failed with error -12
>
> This is unchanged from what is seen with the unaltered 5.15.141 kernel.
>
> In case it's important, can also confirm that the errors related to the
> thunderbolt device are are also still present in the patched 5.15.141+
> kernel:
>
> thunderbolt 0000:04:00.0: interrupt for TX ring 0 is already enabled
> :
> thunderbolt 0000:04:00.0: interrupt for RX ring 0 is already enabled
> :
>
> Like the GPU failure, they do not appear in the working kernels on this
> system.
>
> Let me know if you would like to me to run further tests.
>
> Regards
> jonathan
>
> [1] This is 5.15.141, patched with the series of interest here and the hand
> patch from Ilpo.
Hi Jonathan,
Thanks a lot for testing it regardless. The end result was not a big
surprise given how it looked like based on the logs but was certainly
worth a test like Igor mentioned. The resource allocation code isn't among
the easiest to track.
--
i.
Powered by blists - more mailing lists