[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGETcx9zkjgF=AjCkNcPKLriNk30PGugXKTNRhzTFm5cDQHm0A@mail.gmail.com>
Date: Thu, 4 Mar 2021 19:25:08 -0800
From: Saravana Kannan <saravanak@...gle.com>
To: Michael Walle <michael@...le.cc>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Jon Hunter <jonathanh@...dia.com>,
Marek Szyprowski <m.szyprowski@...sung.com>,
Geert Uytterhoeven <geert@...ux-m68k.org>,
Guenter Roeck <linux@...ck-us.net>,
Android Kernel Team <kernel-team@...roid.com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v1 0/3] driver core: Set fw_devlink=on take II
On Wed, Mar 3, 2021 at 2:21 AM Michael Walle <michael@...le.cc> wrote:
>
> Am 2021-03-03 10:28, schrieb Saravana Kannan:
> > On Wed, Mar 3, 2021 at 12:59 AM Michael Walle <michael@...le.cc> wrote:
> >>
> >> Am 2021-03-02 23:47, schrieb Saravana Kannan:
> >> > On Tue, Mar 2, 2021 at 2:42 PM Saravana Kannan <saravanak@...gle.com>
> >> > wrote:
> >> >>
> >> >> On Tue, Mar 2, 2021 at 2:24 PM Michael Walle <michael@...le.cc> wrote:
> >> >> >
> >> >> > Am 2021-03-02 22:11, schrieb Saravana Kannan:
> >> >> > > I think Patch 1 should fix [4] without [5]. Can you test the series
> >> >> > > please?
> >> >> >
> >> >> > Mh, I'm on latest linux-next (next-20210302) and I've applied patch 3/3
> >> >> > and
> >> >> > reverted commit 7007b745a508 ("PCI: layerscape: Convert to
> >> >> > builtin_platform_driver()"). I'd assumed that PCIe shouldn't be working,
> >> >> > right? But it is. Did I miss something?
> >> >>
> >> >> You need to revert [5].
> >> >
> >> > My bad. You did revert it. Ah... I wonder if it was due to
> >> > fw_devlink.strict that I added. To break PCI again, also set
> >> > fw_devlink.strict=1 in the kernel command line.
> >>
> >> Indeed, adding fw_devlink.strict=1 will break PCI again. But if
> >> I then apply 1/3 and 2/3 again, PCI is still broken. Just to be clear:
> >> I'm keeping the fw_devlink.strict=1 parameter.
> >
> > Thanks for your testing! I assume you are also setting fw_devlink=on?
>
> I've applied patch 3/3 and added nothing to the commandline, so yes.
>
> > Hmmm... ok. In the working case, does your PCI probe before IOMMU? If
> > yes, then your results make sense.
>
> Yes that was the conclusion last time. That the probe is deferred and
> the __init section is already discarded when there might a second
> try of the probe.
Long response below, but the TL;DR is:
The real fix for your case was the implementation of fw_devlink.strict
and NOT Patch 1 of this series. So, sorry for wasting your test
effort.
During the earlier debugging (for take I), this is what I thought:
With fw_devlink=permissive, your boot sequence was (Case 1):
1. IOMMU probe
2. PCI builtin_platform_driver_probe() attempt
- Driver core sets up PCI with IOMMU
- PCI probe succeeds.
- PCI works with IOMMU. <---- Remember this point.
And with fw_devlink=on, I thought the IOMMU probe order was
unnecessarily changed and caused this (Case 2):
1. IOMMU probe reordered for some reason to be attempted before its
suppliers. Gets deferred.
2. PCI probe attempt
- fw_devlink + device links defers the probe because IOMMU isn't ready.
- builtin_platform_driver_probe() replaces drv->probe with
platform_probe_fail()
3. IOMMU deferred probe succeeds eventually.
4. PCI deferred probe is attempted
- platform_probe_fail() which is a stub just returns -ENXIO
And if this was the case, patch 1 in this series would have fixed it
by removing unnecessary reordering of probes.
But what was really happening was (after I went through your logs
again and looked at the code):
With fw_devlink=permissive, your boot sequence was really (Case 3):
1. PCI builtin_platform_driver_probe() attempt
- Driver core does NOT set up PCI with IOMMU
- PCI probe succeeds.
- PCI works without IOMMU. <---- Remember this point.
2. IOMMU probes
And with fw_devlink=on what was happening was (Case 4):
1. PCI builtin_platform_driver_probe() attempt
- fw_devlink + device links defers the probe because it thinks
IOMMU is mandatory and isn't ready.
- builtin_platform_driver_probe() replaces drv->probe with
platform_probe_fail()
2. IOMMU probes.
3. PCI deferred probe is attempted
- platform_probe_fail() which is a stub just returns -ENXIO
4. PCI is broken now.
In your case IOMMU is not mandatory and PCI works without IOMMU even
when fw_devlink=off/permissive. So the real fix for your case is the
addition of fw_devlink.strict and defaulting it to 0. Because of my
misunderstanding of your case, I didn't realize I already fixed your
case and I thought Patch 1 in this series would fix your case.
Patch 1 in this series is still important for other reasons, just not for you.
> So I guess, Patch 1/3 and Patch 2/3 doesn't fix that and the drivers
> still need to be converted to builtin_platform_driver(), right?
So there is no real issue between fw_devlink=on and
builtin_platform_driver_probe() anymore. At least none that I know of
or has been reported.
If you really want your PCI to work _with_ IOMMU, then
builtin_platform_driver_probe() is wrong even with fw_devlink=off. And
if you wanted PCI to work with IOMMU support and fw_devlink wasn't
available, you'll have to play initcall chicken with the IOMMU driver
or implement some IOMMU check + deferred probing in your PCI probe
function.
However, with fw_devlink=on, all you have to do is set fw_devlink=on
and fw_devlink.strict=1 and use builtin_platform_driver() and not have
to care about initcall orders or figure out how to defer when IOMMU
isn't ready yet.
-Saravana
Powered by blists - more mailing lists