[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4768c541-ebf4-61d5-0c5e-77dee83f8f94@arm.com>
Date: Thu, 19 Sep 2019 14:25:38 +0100
From: Robin Murphy <robin.murphy@....com>
To: John Garry <john.garry@...wei.com>, Marc Zyngier <maz@...nel.org>,
Will Deacon <will@...nel.org>,
Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
Sudeep Holla <sudeep.holla@....com>,
"Guohanjun (Hanjun Guo)" <guohanjun@...wei.com>
Cc: iommu <iommu@...ts.linux-foundation.org>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
Linuxarm <linuxarm@...wei.com>,
Shameer Kolothum <shameerali.kolothum.thodi@...wei.com>,
Alex Williamson <alex.williamson@...hat.com>,
Bjorn Helgaas <bhelgaas@...gle.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: arm64 iommu groups issue
Hi John,
On 19/09/2019 09:43, John Garry wrote:
> Hi all,
>
> We have noticed a special behaviour on our arm64 D05 board when the SMMU
> is enabled with regards PCI device iommu groups.
>
> This platform does not support ACS, yet we find that all functions for a
> PCI device are not grouped together:
>
> root@...ntu:/sys# dmesg | grep "Adding to iommu group"
> [ 7.307539] hisi_sas_v2_hw HISI0162:01: Adding to iommu group 0
> [ 12.590533] hns_dsaf HISI00B2:00: Adding to iommu group 1
> [ 13.688527] mlx5_core 000a:11:00.0: Adding to iommu group 2
> [ 14.324606] mlx5_core 000a:11:00.1: Adding to iommu group 3
> [ 14.937090] ehci-platform PNP0D20:00: Adding to iommu group 4
> [ 15.276637] pcieport 0002:f8:00.0: Adding to iommu group 5
> [ 15.340845] pcieport 0004:88:00.0: Adding to iommu group 6
> [ 15.392098] pcieport 0005:78:00.0: Adding to iommu group 7
> [ 15.443356] pcieport 000a:10:00.0: Adding to iommu group 8
> [ 15.484975] pcieport 000c:20:00.0: Adding to iommu group 9
> [ 15.543647] pcieport 000d:30:00.0: Adding to iommu group 10
> [ 15.599771] serial 0002:f9:00.0: Adding to iommu group 5
> [ 15.690807] serial 0002:f9:00.1: Adding to iommu group 5
> [ 84.322097] mlx5_core 000a:11:00.2: Adding to iommu group 8
> [ 84.856408] mlx5_core 000a:11:00.3: Adding to iommu group 8
>
> root@...ntu:/sys# lspci -tv
> lspci -tvv
> -+-[000d:30]---00.0-[31]--
> +-[000c:20]---00.0-[21]----00.0 Huawei Technologies Co., Ltd.
> +-[000a:10]---00.0-[11-12]--+-00.0 Mellanox [ConnectX-5]
> | +-00.1 Mellanox [ConnectX-5]
> | +-00.2 Mellanox [ConnectX-5 VF]
> | \-00.3 Mellanox [ConnectX-5 VF]
> +-[0007:90]---00.0-[91]----00.0 Huawei Technologies Co., ...
> +-[0006:c0]---00.0-[c1]--
> +-[0005:78]---00.0-[79]--
> +-[0004:88]---00.0-[89]--
> +-[0002:f8]---00.0-[f9]--+-00.0 MosChip Semiconductor Technology ...
> | +-00.1 MosChip Semiconductor Technology ...
> | \-00.2 MosChip Semiconductor Technology ...
> \-[0000:00]-
>
> For the PCI devices in question - on port 000a:10:00.0 - you will notice
> that the port and VFs (000a:11:00.2, 3) are in one group, yet the 2 PFs
> (000a:11:00.0, 000a:11:00.1) are in separate groups.
>
> I also notice the same ordering nature on our D06 platform - the
> pcieport is added to an iommu group after PF for that port. However this
> platform supports ACS, so not such a problem.
>
> After some checking, I find that when the pcieport driver probes, the
> associated SMMU device had not registered yet with the IOMMU framework,
> so we defer the probe for this device - in iort.c:iort_iommu_xlate(),
> when no iommu ops are available, we defer.
>
> Yet, when the mlx5 PF devices probe, the iommu ops are available at this
> stage. So the probe continues and we get an iommu group for the device -
> but not the same group as the parent port, as it has not yet been added
> to a group. When the port eventually probes it gets a new, separate group.
>
> This all seems to be as the built-in module init ordering is as follows:
> pcieport drv, smmu drv, mlx5 drv
>
> I notice that if I build the mlx5 drv as a ko and insert after boot, all
> functions + pcieport are in the same group:
>
> [ 11.530046] hisi_sas_v2_hw HISI0162:01: Adding to iommu group 0
> [ 17.301093] hns_dsaf HISI00B2:00: Adding to iommu group 1
> [ 18.743600] ehci-platform PNP0D20:00: Adding to iommu group 2
> [ 20.212284] pcieport 0002:f8:00.0: Adding to iommu group 3
> [ 20.356303] pcieport 0004:88:00.0: Adding to iommu group 4
> [ 20.493337] pcieport 0005:78:00.0: Adding to iommu group 5
> [ 20.702999] pcieport 000a:10:00.0: Adding to iommu group 6
> [ 20.859183] pcieport 000c:20:00.0: Adding to iommu group 7
> [ 20.996140] pcieport 000d:30:00.0: Adding to iommu group 8
> [ 21.152637] serial 0002:f9:00.0: Adding to iommu group 3
> [ 21.346991] serial 0002:f9:00.1: Adding to iommu group 3
> [ 100.754306] mlx5_core 000a:11:00.0: Adding to iommu group 6
> [ 101.420156] mlx5_core 000a:11:00.1: Adding to iommu group 6
> [ 292.481714] mlx5_core 000a:11:00.2: Adding to iommu group 6
> [ 293.281061] mlx5_core 000a:11:00.3: Adding to iommu group 6
>
> This does seem like a problem for arm64 platforms which don't support
> ACS, yet enable an SMMU. Maybe also a problem even if they do support ACS.
>
> Opinion?
Yeah, this is less than ideal. One way to bodge it might be to make
pci_device_group() also walk downwards to see if any non-ACS-isolated
children already have a group, rather than assuming that groups get
allocated in hierarchical order, but that's far from ideal.
The underlying issue is that, for historical reasons, OF/IORT-based
IOMMU drivers have ended up with group allocation being tied to endpoint
driver probing via the dma_configure() mechanism (long story short,
driver probe is the only thing which can be delayed in order to wait for
a specific IOMMU instance to be ready). However, in the meantime, the
IOMMU API internals have evolved sufficiently that I think there's a way
to really put things right - I have the spark of an idea which I'll try
to sketch out ASAP...
Robin.
Powered by blists - more mailing lists