lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4768c541-ebf4-61d5-0c5e-77dee83f8f94@arm.com>
Date:   Thu, 19 Sep 2019 14:25:38 +0100
From:   Robin Murphy <robin.murphy@....com>
To:     John Garry <john.garry@...wei.com>, Marc Zyngier <maz@...nel.org>,
        Will Deacon <will@...nel.org>,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        Sudeep Holla <sudeep.holla@....com>,
        "Guohanjun (Hanjun Guo)" <guohanjun@...wei.com>
Cc:     iommu <iommu@...ts.linux-foundation.org>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        Linuxarm <linuxarm@...wei.com>,
        Shameer Kolothum <shameerali.kolothum.thodi@...wei.com>,
        Alex Williamson <alex.williamson@...hat.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: arm64 iommu groups issue

Hi John,

On 19/09/2019 09:43, John Garry wrote:
> Hi all,
> 
> We have noticed a special behaviour on our arm64 D05 board when the SMMU 
> is enabled with regards PCI device iommu groups.
> 
> This platform does not support ACS, yet we find that all functions for a 
> PCI device are not grouped together:
> 
> root@...ntu:/sys# dmesg | grep "Adding to iommu group"
> [    7.307539] hisi_sas_v2_hw HISI0162:01: Adding to iommu group 0
> [   12.590533] hns_dsaf HISI00B2:00: Adding to iommu group 1
> [   13.688527] mlx5_core 000a:11:00.0: Adding to iommu group 2
> [   14.324606] mlx5_core 000a:11:00.1: Adding to iommu group 3
> [   14.937090] ehci-platform PNP0D20:00: Adding to iommu group 4
> [   15.276637] pcieport 0002:f8:00.0: Adding to iommu group 5
> [   15.340845] pcieport 0004:88:00.0: Adding to iommu group 6
> [   15.392098] pcieport 0005:78:00.0: Adding to iommu group 7
> [   15.443356] pcieport 000a:10:00.0: Adding to iommu group 8
> [   15.484975] pcieport 000c:20:00.0: Adding to iommu group 9
> [   15.543647] pcieport 000d:30:00.0: Adding to iommu group 10
> [   15.599771] serial 0002:f9:00.0: Adding to iommu group 5
> [   15.690807] serial 0002:f9:00.1: Adding to iommu group 5
> [   84.322097] mlx5_core 000a:11:00.2: Adding to iommu group 8
> [   84.856408] mlx5_core 000a:11:00.3: Adding to iommu group 8
> 
> root@...ntu:/sys#  lspci -tv
> lspci -tvv
> -+-[000d:30]---00.0-[31]--
>    +-[000c:20]---00.0-[21]----00.0  Huawei Technologies Co., Ltd.
>    +-[000a:10]---00.0-[11-12]--+-00.0  Mellanox [ConnectX-5]
>    |                           +-00.1  Mellanox [ConnectX-5]
>    |                           +-00.2  Mellanox [ConnectX-5 VF]
>    |                           \-00.3  Mellanox [ConnectX-5 VF]
>    +-[0007:90]---00.0-[91]----00.0  Huawei Technologies Co., ...
>    +-[0006:c0]---00.0-[c1]--
>    +-[0005:78]---00.0-[79]--
>    +-[0004:88]---00.0-[89]--
>    +-[0002:f8]---00.0-[f9]--+-00.0  MosChip Semiconductor Technology ...
>    |                        +-00.1  MosChip Semiconductor Technology ...
>    |                        \-00.2  MosChip Semiconductor Technology ...
>    \-[0000:00]-
> 
> For the PCI devices in question - on port 000a:10:00.0 - you will notice 
> that the port and VFs (000a:11:00.2, 3) are in one group, yet the 2 PFs 
> (000a:11:00.0, 000a:11:00.1) are in separate groups.
> 
> I also notice the same ordering nature on our D06 platform - the 
> pcieport is added to an iommu group after PF for that port. However this 
> platform supports ACS, so not such a problem.
> 
> After some checking, I find that when the pcieport driver probes, the 
> associated SMMU device had not registered yet with the IOMMU framework, 
> so we defer the probe for this device - in iort.c:iort_iommu_xlate(), 
> when no iommu ops are available, we defer.
> 
> Yet, when the mlx5 PF devices probe, the iommu ops are available at this 
> stage. So the probe continues and we get an iommu group for the device - 
> but not the same group as the parent port, as it has not yet been added 
> to a group. When the port eventually probes it gets a new, separate group.
> 
> This all seems to be as the built-in module init ordering is as follows: 
> pcieport drv, smmu drv, mlx5 drv
> 
> I notice that if I build the mlx5 drv as a ko and insert after boot, all 
> functions + pcieport are in the same group:
> 
> [   11.530046] hisi_sas_v2_hw HISI0162:01: Adding to iommu group 0
> [   17.301093] hns_dsaf HISI00B2:00: Adding to iommu group 1
> [   18.743600] ehci-platform PNP0D20:00: Adding to iommu group 2
> [   20.212284] pcieport 0002:f8:00.0: Adding to iommu group 3
> [   20.356303] pcieport 0004:88:00.0: Adding to iommu group 4
> [   20.493337] pcieport 0005:78:00.0: Adding to iommu group 5
> [   20.702999] pcieport 000a:10:00.0: Adding to iommu group 6
> [   20.859183] pcieport 000c:20:00.0: Adding to iommu group 7
> [   20.996140] pcieport 000d:30:00.0: Adding to iommu group 8
> [   21.152637] serial 0002:f9:00.0: Adding to iommu group 3
> [   21.346991] serial 0002:f9:00.1: Adding to iommu group 3
> [  100.754306] mlx5_core 000a:11:00.0: Adding to iommu group 6
> [  101.420156] mlx5_core 000a:11:00.1: Adding to iommu group 6
> [  292.481714] mlx5_core 000a:11:00.2: Adding to iommu group 6
> [  293.281061] mlx5_core 000a:11:00.3: Adding to iommu group 6
> 
> This does seem like a problem for arm64 platforms which don't support 
> ACS, yet enable an SMMU. Maybe also a problem even if they do support ACS.
> 
> Opinion?

Yeah, this is less than ideal. One way to bodge it might be to make 
pci_device_group() also walk downwards to see if any non-ACS-isolated 
children already have a group, rather than assuming that groups get 
allocated in hierarchical order, but that's far from ideal.

The underlying issue is that, for historical reasons, OF/IORT-based 
IOMMU drivers have ended up with group allocation being tied to endpoint 
driver probing via the dma_configure() mechanism (long story short, 
driver probe is the only thing which can be delayed in order to wait for 
a specific IOMMU instance to be ready). However, in the meantime, the 
IOMMU API internals have evolved sufficiently that I think there's a way 
to really put things right - I have the spark of an idea which I'll try 
to sketch out ASAP...

Robin.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ