[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <77be6671-e4e8-4b17-bf72-74bde325671a@nvidia.com>
Date: Thu, 24 Apr 2025 17:49:20 -0700
From: Tushar Dave <tdave@...dia.com>
To: Jason Gunthorpe <jgg@...dia.com>, Vasant Hegde <vasant.hegde@....com>
Cc: Baolu Lu <baolu.lu@...ux.intel.com>, joro@...tes.org, will@...nel.org,
robin.murphy@....com, kevin.tian@...el.com, yi.l.liu@...el.com,
iommu@...ts.linux.dev, linux-kernel@...r.kernel.org,
linux-pci@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH rc] iommu: Skip PASID validation for devices without PASID
capability
On 4/24/25 05:31, Jason Gunthorpe wrote:
> On Thu, Apr 24, 2025 at 12:08:56PM +0530, Vasant Hegde wrote:
>
>>> What the iommu driver should do when set_dev_pasid is called for a non-
>>> PASID device?
>
> That's a good point, maybe the core code should filter that out based
> on max_pasids? I think we do run into trouble here because the drivers
> are allocating PASID table space based on max_pasids so the non-pasid
> device should fail to add the pasid. Tushar, you should have hit this
> in your testing???
When we have multi-device group with PASID device and non-PASID devices,
set_dev_pasid doesn't fail in my testing for non-PASID devices.
Here is the example topology and bit more detail:
0008:00:00.0 root_port
└─0008:01:00.0 upstream_port
├─0008:02:00.0 downstream_port
│ └─0008:03:00.0 endpoint (NIC DMA-PF)
└─0008:02:03.0 downstream_port
└─0008:04:00.0 upstream_port
└─0008:05:00.0 downstream_port
└─0008:06:00.0 endpoint (GPU)
In the above topology, we setup ACS flags on DSP 0008:02:03.0 and 0008:02:00.0
to achieve desired p2p configuration for GPU and DMA-PF.
Apparently, this creates multi-device group with GPU being only device with
PASID support in that group. In this case, set_dev_pasid() ops invoked for each
device within the group with pasid=1 and doesn't fail.
e.g.
...
..
.
pcieport 0008:02:03.0: debug: __iommu_set_group_pasid(): pasid=1
dev->iommu->max_pasids=0 iommu_group 30
pcieport 0008:02:03.0: debug: __iommu_set_group_pasid(): ret 0
pcieport 0008:04:00.0: debug: __iommu_set_group_pasid(): pasid=1
dev->iommu->max_pasids=0 iommu_group 30
pcieport 0008:04:00.0: debug: __iommu_set_group_pasid(): ret 0
pcieport 0008:05:00.0: debug: __iommu_set_group_pasid(): pasid=1
dev->iommu->max_pasids=0 iommu_group 30
pcieport 0008:05:00.0: debug: __iommu_set_group_pasid(): ret 0
nvidia 0008:06:00.0: debug: __iommu_set_group_pasid(): pasid=1
dev->iommu->max_pasids=1048576 iommu_group 30
nvidia 0008:06:00.0: debug: __iommu_set_group_pasid(): ret 0
IMO this outcome is expected. Quoting a text from commit
https://github.com/torvalds/linux/commit/16603704559c7a68718059c4f75287886c01b20f
"If multiple devices share a single group, it's fine as long the fabric
always routes every TLP marked with a PASID to the host bridge and only
the host bridge. For example, ACS achieves this universally and has been
checked when pci_enable_pasid() is called. As we can't reliably tell the
source apart in a group, all the devices in a group have to be considered
as the same source, and mapped to the same PASID table."
-Tushar
>
> We also have a problem setting up the default domain - it won't
> compute IOMMU_HWPT_ALLOC_PASID properly across the group. If the
> no-pasid device probes first then PASID will be broken on the group.
>
> Tushar isn't hitting this because ARM always uses a PASID compatible
> domain today, but it will not work on AMD.
>
> That's a huge pain to deal with :\
>
>> Per device max_pasids check should cover that right?
>
> The driver shouldn't be doing this though, if the driver is told to
> make a pasid then it should make a pasid.. The driver can fail
> attaching a pasid to a device that is over the device's max_pasid.
>
>> FYI. One example of such device is some of the AMD GPUs which has
>> both VGA and audio in same group. while VGA supports PASID, audio is
>> not. This used to work fine when we had AMD IOMMU PASID specific
>> driver. GPUs stopped using PASIDs in upstream kernel. So I didn't
>> look into this part in details.
>
> Uhhh.. That sounds like a worse problem, the only way you should end
> up with same group is if the ACS flags are missing on the GPU so Linux
> assumes the VGA and audio can loopback to each other internally.
>
> That should completely block PASID support on the GPU side due the
> wrong routing. We can't have a hole in the PASID address space where
> the audio BAR is.
>
> I suppose the HW doesn't actually behave this way but since it doesn't
> have the right ACS flags the SW doesn't know? Guessing..
>
> Jason
Powered by blists - more mailing lists