lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 22 Jul 2022 09:39:05 -0500
From:   Bjorn Helgaas <helgaas@...nel.org>
To:     Pali Rohár <pali@...nel.org>,
        Johan Hovold <johan+linaro@...nel.org>
Cc:     Kishon Vijay Abraham I <kishon@...com>,
        Xiaowei Song <songxiaowei@...ilicon.com>,
        Binghui Wang <wangbinghui@...ilicon.com>,
        Thierry Reding <thierry.reding@...il.com>,
        Ryder Lee <ryder.lee@...iatek.com>,
        Jianjun Wang <jianjun.wang@...iatek.com>,
        linux-pci@...r.kernel.org,
        Krzysztof Wilczyński <kw@...ux.com>,
        Ley Foon Tan <ley.foon.tan@...el.com>,
        linux-kernel@...r.kernel.org, Marc Zyngier <maz@...nel.org>
Subject: Re: Why set .suppress_bind_attrs even though .remove() implemented?

[+cc Marc, can you clarify when we need irq_dispose_mapping()?]

On Thu, Jul 21, 2022 at 05:21:22PM -0500, Bjorn Helgaas wrote:
> On Thu, Jul 21, 2022 at 10:46:07PM +0200, Pali Rohár wrote:
> > On Thursday 21 July 2022 14:54:33 Bjorn Helgaas wrote:
> > > The j721e, kirin, tegra, and mediatek drivers all implement .remove().
> > > 
> > > They also set ".suppress_bind_attrs = true".  I think this means
> > > bus_add_driver() will not create the "bind" and "unbind" sysfs
> > > attributes for the driver that would allow users to users to manually
> > > attach and detach devices from it.
> > > 
> > > Is there a reason for this, or should these drivers stop setting
> > > .suppress_bind_attrs?
> > 
> > I have already asked this question during review of kirin driver:
> > https://lore.kernel.org/linux-pci/20211031205527.ochhi72dfu4uidii@pali/
> > 
> > Microchip driver wanted to change its type from bool to tristate
> > https://lore.kernel.org/linux-pci/20220420093449.38054-1-u.kleine-koenig@pengutronix.de/t/#u
> > and after discussion it seems that it is needed to do more work for this
> > driver.
> > 
> > > For example, Pali and Ley Foon *did* stop setting .suppress_bind_attrs
> > > when adding .remove() methods in these commits:
> > > 
> > >   0746ae1be121 ("PCI: mvebu: Add support for compiling driver as module")
> > >   526a76991b7b ("PCI: aardvark: Implement driver 'remove' function and allow to build it as module")
> > >   ec15c4d0d5d2 ("PCI: altera: Allow building as module")
> > 
> > I added it for both pci-mvebu.c and pci-aardvark.c. And just few days
> > ago I realized why suppress_bind_attrs was set to true and remove method
> > was not implemented.
> 
> With suppress_bind_attrs, the user can't manually unbind a device, so
> we can't get to mvebu_pcie_remove() that way, but since mvebu is a
> modular driver, I assume we can unload the module and *that* would
> call mvebu_pcie_remove().  Right?
> 
> > Implementing remove method is not really simple, specially when pci
> > controller driver implements also interrupt controller (e.g. for
> > handling legacy interrupts).
> 
> Hmmm.  Based on your patches below, it looks like we need to call
> irq_dispose_mapping() in some cases, but I'm very confused about
> *which* cases.
> 
> I first thought it was for mappings created with irq_create_mapping(),
> but pci-aardvark.c never calls that, so there must be more to it.
> 
> Currently only altera, iproc, mediatek-gen3, and mediatek call
> irq_dispose_mapping() from their .remove() methods.  (They all call
> irq_domain_remove() *before* irq_dispose_mapping().  Is that legal?
> Your patches do irq_dispose_mapping() *first*.)
> 
> altera, mediatek-gen3, and mediatek call irq_dispose_mapping() on IRQs
> that came from platform_get_irq().
> 
> qcom is a DWC driver, so all the IRQ stuff happens in
> dw_pcie_host_init().  qcom_pcie_remove() does call
> dw_pcie_host_deinit(), which calls irq_domain_remove(), but nobody
> calls irq_dispose_mapping().
> 
> I'm thoroughly confused by all this.  But I suspect that maybe I
> should drop the "make qcom modular" patch because it seems susceptible
> to this problem:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/ctrl/qcom&id=41b68c2d097e
> 
> > Here are waiting fixup patches for pci-mvebu.c and pci-aardvark.c which
> > fixes .remove callback. Without these patches calling 'rmmod driver' let
> > dangling pointer in kernel which may cause random kernel crashes. See:
> > 
> > https://lore.kernel.org/linux-pci/20220709161858.15031-1-pali@kernel.org/
> > https://lore.kernel.org/linux-pci/20220711120626.11492-1-pali@kernel.org/
> > https://lore.kernel.org/linux-pci/20220711120626.11492-2-pali@kernel.org/
> > 
> > So I would suggest to do more detailed review when adding .remove
> > callback for pci controller driver (or when remove suppress_bind_attrs)
> > and do more testings and checking if all IRQ mappings are disposed.
> 
> I'm not smart enough to do "more detailed review" because I don't know
> what things to look for :)  Thanks for all your work in sorting out
> these arcane details!
> 
> Bjorn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ