linux-kernel - Re: arm64 iommu groups issue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <be464e2a-03d5-0b2e-24ee-96d0d14fd739@arm.com>
Date:   Thu, 13 Feb 2020 19:40:42 +0000
From:   Robin Murphy <robin.murphy@....com>
To:     John Garry <john.garry@...wei.com>, Marc Zyngier <maz@...nel.org>,
        Will Deacon <will@...nel.org>,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        Sudeep Holla <sudeep.holla@....com>,
        "Guohanjun (Hanjun Guo)" <guohanjun@...wei.com>
Cc:     iommu <iommu@...ts.linux-foundation.org>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        Linuxarm <linuxarm@...wei.com>,
        Shameer Kolothum <shameerali.kolothum.thodi@...wei.com>,
        Alex Williamson <alex.williamson@...hat.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Saravana Kannan <saravanak@...gle.com>
Subject: Re: arm64 iommu groups issue

On 13/02/2020 3:49 pm, John Garry wrote:
>>>
>>> The underlying issue is that, for historical reasons, OF/IORT-based
>>> IOMMU drivers have ended up with group allocation being tied to endpoint
>>> driver probing via the dma_configure() mechanism (long story short,
>>> driver probe is the only thing which can be delayed in order to wait for
>>> a specific IOMMU instance to be ready).However, in the meantime, the
>>> IOMMU API internals have evolved sufficiently that I think there's a way
>>> to really put things right - I have the spark of an idea which I'll try
>>> to sketch out ASAP...
>>>
>>
>> OK, great.
> 
> Hi Robin,
> 
> I was wondering if you have had a chance to consider this problem again?

Yeah, sorry, more things came up such that it still hasn't been P yet ;)

Lorenzo and I did get as far as discussing it a bit more and writing up 
a ticket, so it's formally on our internal radar now, at least.

> One simple idea could be to introduce a device link between the endpoint 
> device and its parent bridge to ensure that they probe in order, as 
> expected in pci_device_group():
> 
> Subject: [PATCH] PCI: Add device link to ensure endpoint device driver 
> probes after bridge
> 
> It is required to ensure that a device driver for an endpoint will probe
> after the parent port driver, so add a device link for this.
> 
> ---
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 512cb4312ddd..4b832ad25b20 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2383,6 +2383,7 @@ static void pci_set_msi_domain(struct pci_dev *dev)
>   void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
>   {
>       int ret;
> +    struct device *parent;
> 
>       pci_configure_device(dev);
> 
> @@ -2420,6 +2421,10 @@ void pci_device_add(struct pci_dev *dev, struct 
> pci_bus *bus)
>       /* Set up MSI IRQ domain */
>       pci_set_msi_domain(dev);
> 
> +    parent = dev->dev.parent;
> +    if (parent && parent->bus == &pci_bus_type)
> +        device_link_add(&dev->dev, parent, DL_FLAG_AUTOPROBE_CONSUMER);
> +
>       /* Notifier could use PCI capabilities */
>       dev->match_driver = false;
>       ret = device_add(&dev->dev);
> -- 
> 
> This would work, but the problem is that if the port driver fails in 
> probing - and not just for -EPROBE_DEFER - then the child device will 
> never probe. This very thing happens on my dev board. However we could 
> expand the device links API to cover this sort of scenario.

Yes, that's an undesirable issue, but in fact I think it's mostly 
indicative that involving drivers in something which is designed to 
happen at a level below drivers is still fundamentally wrong and doomed 
to be fragile at best.

Another thought that crosses my mind is that when pci_device_group() 
walks up to the point of ACS isolation and doesn't find an existing 
group, it can still infer that everything it walked past *should* be put 
in the same group it's then eventually going to return. Unfortunately I 
can't see an obvious way for it to act on that knowledge, though, since 
recursive iommu_probe_device() is unlikely to end well.

> As for alternatives, it looks pretty difficult to me to disassociate the 
> group allocation from the dma_configure path.

Indeed it's non-trivial, but it really does need cleaning up at some point.

Having just had yet another spark, does something like the untested 
super-hack below work at all? I doubt it's a viable direction to take in 
itself, but it could be food for thought if it at least proves the concept.

Robin.

----->8-----
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index aa3ac2a03807..554cde76c766 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -3841,20 +3841,20 @@ static int arm_smmu_set_bus_ops(struct iommu_ops 
*ops)
  	int err;

  #ifdef CONFIG_PCI
-	if (pci_bus_type.iommu_ops != ops) {
+	if (1) {
  		err = bus_set_iommu(&pci_bus_type, ops);
  		if (err)
  			return err;
  	}
  #endif
  #ifdef CONFIG_ARM_AMBA
-	if (amba_bustype.iommu_ops != ops) {
+	if (1) {
  		err = bus_set_iommu(&amba_bustype, ops);
  		if (err)
  			goto err_reset_pci_ops;
  	}
  #endif
-	if (platform_bus_type.iommu_ops != ops) {
+	if (1) {
  		err = bus_set_iommu(&platform_bus_type, ops);
  		if (err)
  			goto err_reset_amba_ops;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 660eea8d1d2f..b81ae2b4d4fb 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1542,6 +1542,14 @@ static int iommu_bus_init(struct bus_type *bus, 
const struct iommu_ops *ops)
  	return err;
  }

+static int iommu_bus_replay(struct device *dev, void *data)
+{
+	if (dev->iommu_group)
+		return 0;
+
+	return add_iommu_group(dev, data);
+}
+
  /**
   * bus_set_iommu - set iommu-callbacks for the bus
   * @bus: bus.
@@ -1564,6 +1572,9 @@ int bus_set_iommu(struct bus_type *bus, const 
struct iommu_ops *ops)
  		return 0;
  	}

+	if (bus->iommu_ops == ops)
+		return bus_for_each_dev(bus, NULL, NULL, iommu_bus_replay);
+
  	if (bus->iommu_ops != NULL)
  		return -EBUSY;