[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <14a5130c-3180-87db-5a14-2aaeaf97b7ce@arm.com>
Date: Mon, 30 Jan 2023 13:49:20 +0000
From: Robin Murphy <robin.murphy@....com>
To: Baolu Lu <baolu.lu@...ux.intel.com>,
Jason Gunthorpe <jgg@...dia.com>
Cc: joro@...tes.org, will@...nel.org, hch@....de,
iommu@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/8] iommu: Decouple iommu_present() from bus ops
On 2023-01-28 08:49, Baolu Lu wrote:
> On 2023/1/27 23:43, Jason Gunthorpe wrote:
>> On Fri, Jan 27, 2023 at 03:19:55PM +0000, Robin Murphy wrote:
>>
>>> The current implementation of device_iommu_mapped() just dates back
>>> to when
>>> dev->iommu_group was the only per-device thing we had, so in principle I
>>> don't have any conceptual objection to redefining it in terms of
>>> "device has
>>> ops" rather than "device has a group", but as things stand you'd
>>> still have
>>> to do something about PPC first (I know Jason had been pushing on
>>> that, but
>>> I've not kept track of where it got to).
>> PPC hasn't moved at all, AFAICT. In a few more months I'm going to
>> suggest we delete the special VFIO support due to it being broken,
>> distros already having turned it off and nobody caring enough to fix
>> it..
>>
>> What does device_iommu_mapped() even really mean?
>>
>> Looking at usages..
>>
>> These are fixing SOC HW bugs/issues - the desire seems to be "is the
>> SOC's
>> IOMMU enabled"
>>
>> drivers/char/agp/intel-gtt.c:
>> device_iommu_mapped(&intel_private.pcidev->dev));
>> drivers/dma/sh/rcar-dmac.c: if (device_iommu_mapped(&pdev->dev))
>> drivers/gpu/drm/i915/i915_utils.c: if
>> (device_iommu_mapped(i915->drm.dev))
>> ?
>> drivers/usb/dwc3/dwc3-xilinx.c: if (of_dma_is_coherent(dev->of_node)
>> || device_iommu_mapped(dev)) {
>> drivers/usb/host/xhci.c: if (!(xhci->quirks &
>> XHCI_ZERO_64B_REGS) || !device_iommu_mapped(dev))
>> drivers/crypto/qat/qat_common/adf_sriov.c: if
>> (!device_iommu_mapped(&pdev->dev))
>> ?
>>
>> These seem to be trying to decide if iommu_domain's can be used (and
>> they can't be on power):
>>
>> drivers/gpu/drm/msm/msm_drv.c: if (device_iommu_mapped(mdp_dev))
>> drivers/gpu/drm/msm/msm_drv.c: device_iommu_mapped(dev->dev) ||
>> drivers/gpu/drm/msm/msm_drv.c:
>> device_iommu_mapped(dev->dev->parent);
>> drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c: if
>> (device_iommu_mapped(dev)) {
>> drivers/gpu/drm/rockchip/rockchip_drm_drv.c: if
>> (!device_iommu_mapped(dev))
>> drivers/gpu/drm/tegra/uapi.c: if
>> (device_iommu_mapped(client->base.dev) &&
>> client->ops->can_use_memory_ctx) {
>> drivers/gpu/host1x/context.c: if (!fwspec ||
>> !device_iommu_mapped(&ctx->dev)) {
>> drivers/infiniband/hw/usnic/usnic_ib_main.c: if
>> (!device_iommu_mapped(&pdev->dev)) {
>>
>> Yikes, trying to map DMA addresses programmed into devices back to CPU
>> addresses:
>>
>> drivers/misc/habanalabs/common/debugfs.c: if (!user_address ||
>> device_iommu_mapped(&hdev->pdev->dev)) {
>> drivers/misc/habanalabs/gaudi2/gaudi2.c: if
>> (!device_iommu_mapped(&hdev->pdev->dev))
>>
>> And then sequencing the call to iommu_probe_device() which doesn't
>> apply to power:
>>
>> drivers/acpi/scan.c: if (!err && dev->bus &&
>> !device_iommu_mapped(dev))
>> drivers/iommu/of_iommu.c: if (!err && dev->bus &&
>> !device_iommu_mapped(dev))
>>
>> Leaving these:
>>
>> arch/powerpc/kernel/eeh.c: if (device_iommu_mapped(dev)) {
>>
>> This is only used to support eeh_iommu_group_to_pe which is only
>> caleld by vfio_iommu_spapr_tce.c. Since power vfio doesn't work right
>> now this is uncallable, and when power is fixed this will work
>> properly.
Oh wow, I should have looked at more context... Even better, this one is
already just an elaborate "if (true)" - it has been impossible for
dev_has_iommu_table() to return 0 since at least 2015 :D
>> arch/powerpc/kernel/iommu.c: if (device_iommu_mapped(dev)) {
>> arch/powerpc/kernel/iommu.c: if (!device_iommu_mapped(dev)) {
>>
>> These should both be replaced with some kind of 'device has iommu
>> group', since
>> it is really driving ppc unique group logic.
And in fact those appear to mostly serve for printing debug messages; if
we made iommu_group_add_device() return -EBUSY for duplicate calls (not
necessarily a bad idea anyway vs. relying on the noisy failure of
sysfs_create_link()), they could arguably just go.
All in all, it's only actually the habanalabs ones that I'm slightly
wary of, since they're effectively (mis)using device_iommu_mapped() to
infer the DMA ops implementation, which could potentially go wrong (or
at least *more* wrong) on POWER with this change. I guess the saving
grace is that although they are available on PCIe-interfaced modules,
the userspace driver stack seems to be implicitly x86_64-only - as far
as I could tell from a quick poke around their site and documentation,
which doesn't appear to acknowledge the concept of CPU architectures at
all - so the chances of anyone actually trying to use the kernel drivers
in anger on POWER seem minimal.
Thanks,
Robin.
>> So, I'd say Baolu's approach is the right thing, just replace the
>> above two in ppc with something else.
>
> Thank you both. I will follow up a series later.
>
> Best regards,
> baolu
Powered by blists - more mailing lists