[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CALAqxLW=11O=bRQbVr_Bg6t97BhfxAqLC1OmXRATNjKkbxVUTA@mail.gmail.com>
Date: Fri, 30 Oct 2020 17:12:41 -0700
From: John Stultz <john.stultz@...aro.org>
To: Robin Murphy <robin.murphy@....com>
Cc: Will Deacon <will@...nel.org>, Maulik Shah <mkshah@...eaurora.org>,
Jason Cooper <jason@...edaemon.net>,
Saravana Kannan <saravanak@...gle.com>,
Marc Zyngier <maz@...nel.org>,
lkml <linux-kernel@...r.kernel.org>,
Lina Iyer <ilina@...eaurora.org>,
"open list:GPIO SUBSYSTEM" <linux-gpio@...r.kernel.org>,
iommu@...ts.linux-foundation.org, Andy Gross <agross@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
Linus Walleij <linus.walleij@...aro.org>,
linux-arm-msm <linux-arm-msm@...r.kernel.org>,
Todd Kjos <tkjos@...gle.com>
Subject: Re: [PATCH v2 5/5] firmware: QCOM_SCM: Allow qcom_scm driver to be
loadable as a permenent module
On Fri, Oct 30, 2020 at 7:12 AM Robin Murphy <robin.murphy@....com> wrote:
> On 2020-10-30 01:02, John Stultz wrote:
> > On Wed, Oct 28, 2020 at 7:51 AM Robin Murphy <robin.murphy@....com> wrote:
> >> Hmm, perhaps I'm missing something here, but even if the config options
> >> *do* line up, what prevents arm-smmu probing before qcom-scm and
> >> dereferencing NULL in qcom_scm_qsmmu500_wait_safe_toggle() before __scm
> >> is initialised?
> >
> > Oh man, this spun me on a "wait, but how does it all work!" trip. :)
> >
> > So in the non-module case, the qcom_scm driver is a subsys_initcall
> > and the arm-smmu is a module_platform_driver, so the ordering works
> > out.
> >
> > In the module case, the arm-smmu code isn't loaded until the qcom_scm
> > driver finishes probing due to the symbol dependency handling.
>
> My point is that module load != driver probe. AFAICS you could disable
> drivers_autoprobe, load both modules, bind the SMMU to its driver first,
> and boom!
>
> > To double check this, I added a big msleep at the top of the
> > qcom_scm_probe to try to open the race window you described, but the
> > arm_smmu_device_probe() doesn't run until after qcom_scm_probe
> > completes.
>
> I don't think asynchronous probing is enabled by default, so indeed I
> would expect that to still happen to work ;)
>
> > So at least as a built in / built in, or a module/module case its ok.
> > And in the case where arm-smmu is a module and qcom_scm is built in
> > that's ok too.
>
> In the built-in case, I'm sure it happens to work out similarly because
> the order of nodes in the DTB tends to be the order in which devices are
> autoprobed. Again, async probe might be enough to break things; manually
> binding drivers definitely should; moving the firmware node to the end
> of the DTB probably would as well.
So, with modules, I turned on async probing for the two drivers, as
well as moved the firmware node to the end of the dtb, and couldn't
manage to trip it up even with a 6 second delay in the qcom_scm probe
function.
It was only when doing the same with everything built in that I did
manage to trigger the issue. So yes, you're right! And it is an issue
more easily tripped with everything built in statically (and not
really connected to this patch series).
> > Its just the case my patch is trying to prevent is where arm-smmu is
> > built in, but qcom_scm is a module that it can't work (due to build
> > errors in missing symbols, or if we tried to use function pointers to
> > plug in the qcom_scm - the lack of initialization ordering).
> >
> > Hopefully that addresses your concern? Let me know if I'm still
> > missing something.
>
> What I was dancing around is that the SCM API (and/or its users) appears
> to need a general way to tell whether SCM is ready or not, because the
> initialisation ordering problem is there anyway. Neither Kconfig nor the
> module loader can solve that.
Hrm... There is qcom_scm_is_available(). I tried adding a check for
that in qcom_smmu_impl_init() and return EPROBE_DEFER if it fails, but
I then hit a separate crash (tripping in iommu_group_remove_device on
a null dev->iommu_group value). But after adding a check for that, it
seems to be working...
I'll try to spin up a separate patch here for that in a second.
thanks
-john
Powered by blists - more mailing lists