lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 27 Jul 2022 14:57:16 -0500
From:   Bjorn Helgaas <helgaas@...nel.org>
To:     Johan Hovold <johan@...nel.org>
Cc:     Marc Zyngier <maz@...nel.org>,
        Pali Rohár <pali@...nel.org>,
        Johan Hovold <johan+linaro@...nel.org>,
        Kishon Vijay Abraham I <kishon@...com>,
        Xiaowei Song <songxiaowei@...ilicon.com>,
        Binghui Wang <wangbinghui@...ilicon.com>,
        Thierry Reding <thierry.reding@...il.com>,
        Ryder Lee <ryder.lee@...iatek.com>,
        Jianjun Wang <jianjun.wang@...iatek.com>,
        linux-pci@...r.kernel.org,
        Krzysztof Wilczyński <kw@...ux.com>,
        Ley Foon Tan <ley.foon.tan@...el.com>,
        linux-kernel@...r.kernel.org
Subject: Re: Why set .suppress_bind_attrs even though .remove() implemented?

On Tue, Jul 26, 2022 at 11:56:59AM +0200, Johan Hovold wrote:
> On Mon, Jul 25, 2022 at 06:35:27PM +0100, Marc Zyngier wrote:
> > On Mon, 25 Jul 2022 16:18:48 +0100,
> > Johan Hovold <johan@...nel.org> wrote:
> 
> > > Since when is unloading modules something that is expected to
> > > work perfectly? I keep hearing "well, don't do that then" when
> > > someone complains about unloading this module while doing this
> > > or that broke something. (And it's only root that can unload
> > > modules in the first place.)
> > 
> > Well, maybe I have higher standards. For the stuff I maintain, I
> > now point-blank refuse to support module unloading if this can
> > result in a crash. Or worse.
> 
> That makes sense for regular interrupt controllers where its hard to
> tell that all consumers are gone, but I don't think that should
> limit the usefulness of having modular PCI controller drivers where
> we know that the consumers are gone after deregistering the bus
> (i.e. the consumers are descendants of the controller in the device
> tree).

Those consumers are endpoint drivers, so I think this depends on those
drivers correctly unmapping the interrupts they use, right?

> > > It's useful for developers, but use it at your own risk.
> > > 
> > > That said, I agree that if something is next to impossible to
> > > get right, as may be the case with interrupt controllers
> > > generally, then fine, let's disable module unloading for that
> > > class of drivers.
> > > 
> > > And this would mean disabling driver unbind for the 20+ driver
> > > PCI drivers that currently implement it to some degree.
> > 
> > That would be Bjorn's and Lorenzo's call.
> 
> Sure, but I think it would be the wrong decision here. Especially,
> since the end result will likely just be that more drivers will
> become always compiled-in.

Can you elaborate on this?  I think Marc is suggesting that these PCI
controller drivers be modular but not removable.  Why would that cause
more of them to be compiled-in?

> > > > > Turns out the pcie-qcom driver does not support legacy
> > > > > interrupts so there's no risk of there being any lingering
> > > > > mappings if I understand things correctly.
> > > > 
> > > > It still does MSIs, thanks to dw_pcie_host_init(). If you can
> > > > remove the driver while devices are up and running with MSIs
> > > > allocated, things may get ugly if things align the wrong way
> > > > (if a driver still has a reference to an irq_desc or irq_data,
> > > > for example).
> > > 
> > > That is precisely the way I've been testing it and everything
> > > appears to be tore down as it should.
> > >
> > > And a PCI driver that has been unbound should have released its
> > > resources, or that's a driver bug. Right?
> > 
> > But that's the thing: you can easily remove part of the
> > infrastructure without the endpoint driver even noticing. It may
> > not happen in your particular case if removing the RC driver will
> > also nuke the endpoints in the process, but I can't see this is an
> > absolute guarantee. The crash pointed to by an earlier email is
> > symptomatic of it.
> 
> But that was arguably due to a driver bug, which we know how to fix.
> For MSIs the endpoint driver will free its interrupts and all is
> good.
> 
> The key observation is that the driver model will make sure that any
> endpoint drivers have been unbound before the bus is deregistered.
> 
> That means there are no longer any consumers of the interrupts,
> which can be disposed. For MSI this is handled by
> pci_free_irq_vectors() when unbinding the endpoint drivers. For
> legacy interrupts, which can be shared, the PCIe RC driver needs to
> manage this itself after the consumers are gone.

The driver model ensures that endpoint drivers have been unbound. But
doesn't the interrupt disposal depend on the correct functioning of
those endpoint drivers?  So if a buggy endpoint driver failed to
dispose of them, we're still vulnerable?

Bjorn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ