[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87czdtxnfn.wl-maz@kernel.org>
Date: Mon, 25 Jul 2022 15:43:40 +0100
From: Marc Zyngier <maz@...nel.org>
To: Johan Hovold <johan@...nel.org>
Cc: Bjorn Helgaas <helgaas@...nel.org>,
Pali Rohár <pali@...nel.org>,
Johan Hovold <johan+linaro@...nel.org>,
Kishon Vijay Abraham I <kishon@...com>,
Xiaowei Song <songxiaowei@...ilicon.com>,
Binghui Wang <wangbinghui@...ilicon.com>,
Thierry Reding <thierry.reding@...il.com>,
Ryder Lee <ryder.lee@...iatek.com>,
Jianjun Wang <jianjun.wang@...iatek.com>,
linux-pci@...r.kernel.org,
Krzysztof Wilczyński <kw@...ux.com>,
Ley Foon Tan <ley.foon.tan@...el.com>,
linux-kernel@...r.kernel.org
Subject: Re: Why set .suppress_bind_attrs even though .remove() implemented?
On Mon, 25 Jul 2022 14:25:49 +0100,
Johan Hovold <johan@...nel.org> wrote:
>
> [ +CC: maz ]
>
> On Fri, Jul 22, 2022 at 09:38:58AM -0500, Bjorn Helgaas wrote:
> > On Fri, Jul 22, 2022 at 03:26:44PM +0200, Johan Hovold wrote:
> > > On Thu, Jul 21, 2022 at 05:21:22PM -0500, Bjorn Helgaas wrote:
> >
> > > > qcom is a DWC driver, so all the IRQ stuff happens in
> > > > dw_pcie_host_init(). qcom_pcie_remove() does call
> > > > dw_pcie_host_deinit(), which calls irq_domain_remove(), but nobody
> > > > calls irq_dispose_mapping().
> > > >
> > > > I'm thoroughly confused by all this. But I suspect that maybe I
> > > > should drop the "make qcom modular" patch because it seems susceptible
> > > > to this problem:
> > > >
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/ctrl/qcom&id=41b68c2d097e
> > >
> > > That should not be necessary.
> > >
> > > As you note above, interrupt handling is implemented in dwc core so if
> > > there are any issue here at all, which I doubt, then all of the dwc
> > > drivers that currently can be built as modules would all be broken and
> > > this would need to be fixed in core.
> >
> > I don't know yet whether there's an issue. We need a clear argument
> > for why there is or is not. The fact that others might be broken is
> > not an argument for breaking another one ;)
>
> It's not breaking anything that is currently working, and if there's
> some corner case during module unload, that's not the end of the world
> either.
It may not be the end of the world for you, but you have absolutely no
idea of what dangling pointers to kernel memory will do on a user
machine, nor how this can be further exploited. Unloading a module
should never result in an unsafe kernel.
> It's a feature useful for developers and no one expects remove code to
> be perfect (e.g. resilient against someone trying to break it by doing
> things in parallel, etc.).
If that's a feature for you while you are developing, then please keep
this change as part of your own hacking toolbox. IMO the upstream
kernel shouldn't be subjected to this.
>
> > > I've been using the modular pcie-qcom patch for months now, unloading
> > > and reloading the driver repeatedly to test power sequencing, without
> > > noticing any problems whatsoever.
> >
> > Pali's commit log suggests that unloading the module is not, by
> > itself, enough to trigger the problem:
> >
> > https://lore.kernel.org/linux-pci/20220709161858.15031-1-pali@kernel.org/
> >
> > Can you test the scenario he mentions?
>
> Turns out the pcie-qcom driver does not support legacy interrupts so
> there's no risk of there being any lingering mappings if I understand
> things correctly.
It still does MSIs, thanks to dw_pcie_host_init(). If you can remove
the driver while devices are up and running with MSIs allocated,
things may get ugly if things align the wrong way (if a driver still
has a reference to an irq_desc or irq_data, for example).
M.
--
Without deviation from the norm, progress is not possible.
Powered by blists - more mailing lists