lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241202092902.rp6xb3f64llpabbi@thinkpad>
Date: Mon, 2 Dec 2024 14:59:02 +0530
From: Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>
To: Rob Herring <robh@...nel.org>
Cc: Peng Fan <peng.fan@....com>, "Peng Fan (OSS)" <peng.fan@....nxp.com>,
	Will Deacon <will@...nel.org>,
	Lorenzo Pieralisi <lpieralisi@...nel.org>,
	Krzysztof Wilczyński <kw@...ux.com>,
	Bjorn Helgaas <bhelgaas@...gle.com>,
	Pali Rohár <pali@...nel.org>,
	"open list:PCI DRIVER FOR GENERIC OF HOSTS" <linux-pci@...r.kernel.org>,
	"moderated list:PCI DRIVER FOR GENERIC OF HOSTS" <linux-arm-kernel@...ts.infradead.org>,
	open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] PCI: check bridge->bus in pci_host_common_remove

On Wed, Nov 27, 2024 at 01:56:50PM -0600, Rob Herring wrote:
> On Fri, Nov 15, 2024 at 08:17:20PM +0530, Manivannan Sadhasivam wrote:
> > On Fri, Nov 15, 2024 at 10:14:10AM +0000, Peng Fan wrote:
> > > Hi Manivannan,
> > > 
> > > > Subject: Re: [PATCH] PCI: check bridge->bus in
> > > > pci_host_common_remove
> > > > 
> > > > On Mon, Oct 28, 2024 at 04:46:43PM +0800, Peng Fan (OSS) wrote:
> > > > > From: Peng Fan <peng.fan@....com>
> > > > >
> > > > > When PCI node was created using an overlay and the overlay is
> > > > > reverted/destroyed, the "linux,pci-domain" property no longer exists,
> > > > > so of_get_pci_domain_nr will return failure. Then
> > > > > of_pci_bus_release_domain_nr will actually use the dynamic IDA,
> > > > even
> > > > > if the IDA was allocated in static IDA. So the flow is as below:
> > > > > A: of_changeset_revert
> > > > >     pci_host_common_remove
> > > > >      pci_bus_release_domain_nr
> > > > >        of_pci_bus_release_domain_nr
> > > > >          of_get_pci_domain_nr      # fails because overlay is gone
> > > > >          ida_free(&pci_domain_nr_dynamic_ida)
> > > > >
> > > > > With driver calls pci_host_common_remove explicity, the flow
> > > > becomes:
> > > > > B pci_host_common_remove
> > > > >    pci_bus_release_domain_nr
> > > > >     of_pci_bus_release_domain_nr
> > > > >      of_get_pci_domain_nr      # succeeds in this order
> > > > >       ida_free(&pci_domain_nr_static_ida)
> > > > > A of_changeset_revert
> > > > >    pci_host_common_remove
> > > > >
> > > > > With updated flow, the pci_host_common_remove will be called
> > > > twice, so
> > > > > need to check 'bridge->bus' to avoid accessing invalid pointer.
> > > > >
> > > > > Fixes: c14f7ccc9f5d ("PCI: Assign PCI domain IDs by ida_alloc()")
> > > > > Signed-off-by: Peng Fan <peng.fan@....com>
> > > > 
> > > > I went through the previous discussion [1] and I couldn't see an
> > > > agreement on the point raised by Bjorn on 'removing the host bridge
> > > > before the overlay'.
> > > 
> > > This patch is an agreement to Bjorn's idea. 
> > > 
> > > I have added pci_host_common_remove to remove host bridge
> > > before removing overlay as I wrote in commit log.
> > > 
> > > But of_changeset_revert will still runs into pci_host_
> > > common_remove to remove the host bridge again. Per
> > > my view, the design of of_changeset_revert to remove
> > > the device tree node will trigger device remove, so even
> > > pci_host_common_remove was explicitly used before
> > > of_changeset_revert. The following call to of_changeset_revert
> > > will still call pci_host_common_remove.
> > > 
> > > So I did this patch to add a check of 'bus' to avoid remove again.
> > > 
> > 
> > Ok. I think there was a misunderstanding. Bjorn's example driver,
> > 'i2c-demux-pinctrl' applies the changeset, then adds the i2c adapter for its
> > own. And in remove(), it does the reverse.
> > 
> > But in your case, the issue is with the host bridge driver that gets probed
> > because of the changeset. While with 'i2c-demux-pinctrl' driver, it only
> > applies the changeset. So we cannot compare both drivers. I believe in your
> > case, 'i2c-demux-pinctrl' becomes 'jailhouse', isn't it?
> > 
> > So in your case, changeset is applied by jailhouse and that causes the
> > platform device to be created for the host bridge and then the host bridge
> > driver gets probed. So during destroy(), you call of_changeset_revert() that
> > removes the platform device and during that process it removes the host bridge
> > driver. The issue happens because during host bridge remove, it calls
> > pci_remove_root_bus() and that tries to remove the domain_nr using
> > pci_bus_release_domain_nr().
> >
> > But pci_bus_release_domain_nr() uses DT node to check whether to free the
> > domain_nr from static IDA or dynamic IDA. And because there is no DT node exist
> > at this time (it was already removed by of_changeset_revert()), it forces
> > pci_bus_release_domain_nr() to use dynamic IDA even though the IDA was initially
> > allocated from static IDA.
> 
> Putting linux,pci-domain in an overlay is the same problem as aliases in 
> overlays[1]. It's not going to work well.
> 
> IMO, you can have overlays, or you can have static domains. You can't 
> have both.
> 

Okay. 

> > I think a neat way to solve this issue would be by removing the OF node only
> > after removing all platform devices/drivers associated with that node. But I
> > honestly do not know whether that is possible or not. Otherwise, any other
> > driver that relies on the OF node in its remove() callback, could suffer from
> > the same issue. And whatever fix we may come up with in PCI core, it will be a
> > band-aid only.
> > 
> > I'd like to check with Rob first about his opinion.
> 
> If the struct device has an of_node set, there should be a reference 
> count on that node. But I think that only prevents the node from being 
> freed. It does not prevent the overlay from being detached. This is one 
> of many of the issues with overlays Frank painstakingly documented[2].
> 

Ah, I do remember this page as Frank ended up creating it based on my
continuous nudge to add CONFIG_FS interface for applying overlays.

So why are we applying overlays in kernel now?

> Perhaps it is just a matter of iterating thru all the nodes in an 
> overlay, getting their driver/device, and forcing them to unbind. 
> Though that has to be done per bus type.
> 

Sounds like the correct approach.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ