lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <14a5130c-3180-87db-5a14-2aaeaf97b7ce@arm.com>
Date:   Mon, 30 Jan 2023 13:49:20 +0000
From:   Robin Murphy <robin.murphy@....com>
To:     Baolu Lu <baolu.lu@...ux.intel.com>,
        Jason Gunthorpe <jgg@...dia.com>
Cc:     joro@...tes.org, will@...nel.org, hch@....de,
        iommu@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/8] iommu: Decouple iommu_present() from bus ops

On 2023-01-28 08:49, Baolu Lu wrote:
> On 2023/1/27 23:43, Jason Gunthorpe wrote:
>> On Fri, Jan 27, 2023 at 03:19:55PM +0000, Robin Murphy wrote:
>>
>>> The current implementation of device_iommu_mapped() just dates back 
>>> to when
>>> dev->iommu_group was the only per-device thing we had, so in principle I
>>> don't have any conceptual objection to redefining it in terms of 
>>> "device has
>>> ops" rather than "device has a group", but as things stand you'd 
>>> still have
>>> to do something about PPC first (I know Jason had been pushing on 
>>> that, but
>>> I've not kept track of where it got to).
>> PPC hasn't moved at all, AFAICT. In a few more months I'm going to
>> suggest we delete the special VFIO support due to it being broken,
>> distros already having turned it off and nobody caring enough to fix
>> it..
>>
>> What does device_iommu_mapped() even really mean?
>>
>> Looking at usages..
>>
>> These are fixing SOC HW bugs/issues - the desire seems to be "is the 
>> SOC's
>> IOMMU enabled"
>>
>> drivers/char/agp/intel-gtt.c:           
>> device_iommu_mapped(&intel_private.pcidev->dev));
>> drivers/dma/sh/rcar-dmac.c:     if (device_iommu_mapped(&pdev->dev))
>> drivers/gpu/drm/i915/i915_utils.c:      if 
>> (device_iommu_mapped(i915->drm.dev))
>> ?
>> drivers/usb/dwc3/dwc3-xilinx.c: if (of_dma_is_coherent(dev->of_node) 
>> || device_iommu_mapped(dev)) {
>> drivers/usb/host/xhci.c:        if (!(xhci->quirks & 
>> XHCI_ZERO_64B_REGS) || !device_iommu_mapped(dev))
>> drivers/crypto/qat/qat_common/adf_sriov.c:      if 
>> (!device_iommu_mapped(&pdev->dev))
>> ?
>>
>> These seem to be trying to decide if iommu_domain's can be used (and
>> they can't be on power):
>>
>> drivers/gpu/drm/msm/msm_drv.c:  if (device_iommu_mapped(mdp_dev))
>> drivers/gpu/drm/msm/msm_drv.c:          device_iommu_mapped(dev->dev) ||
>> drivers/gpu/drm/msm/msm_drv.c:          
>> device_iommu_mapped(dev->dev->parent);
>> drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c:     if 
>> (device_iommu_mapped(dev)) {
>> drivers/gpu/drm/rockchip/rockchip_drm_drv.c:    if 
>> (!device_iommu_mapped(dev))
>> drivers/gpu/drm/tegra/uapi.c:   if 
>> (device_iommu_mapped(client->base.dev) && 
>> client->ops->can_use_memory_ctx) {
>> drivers/gpu/host1x/context.c:           if (!fwspec || 
>> !device_iommu_mapped(&ctx->dev)) {
>> drivers/infiniband/hw/usnic/usnic_ib_main.c:    if 
>> (!device_iommu_mapped(&pdev->dev)) {
>>
>> Yikes, trying to map DMA addresses programmed into devices back to CPU 
>> addresses:
>>
>> drivers/misc/habanalabs/common/debugfs.c: if (!user_address || 
>> device_iommu_mapped(&hdev->pdev->dev)) {
>> drivers/misc/habanalabs/gaudi2/gaudi2.c:                if 
>> (!device_iommu_mapped(&hdev->pdev->dev))
>>
>> And then sequencing the call to iommu_probe_device() which doesn't
>> apply to power:
>>
>> drivers/acpi/scan.c:    if (!err && dev->bus && 
>> !device_iommu_mapped(dev))
>> drivers/iommu/of_iommu.c:       if (!err && dev->bus && 
>> !device_iommu_mapped(dev))
>>
>> Leaving these:
>>
>> arch/powerpc/kernel/eeh.c:      if (device_iommu_mapped(dev)) {
>>
>> This is only used to support eeh_iommu_group_to_pe which is only
>> caleld by vfio_iommu_spapr_tce.c. Since power vfio doesn't work right
>> now this is uncallable, and when power is fixed this will work
>> properly.

Oh wow, I should have looked at more context... Even better, this one is 
already just an elaborate "if (true)" - it has been impossible for 
dev_has_iommu_table() to return 0 since at least 2015 :D
>> arch/powerpc/kernel/iommu.c:    if (device_iommu_mapped(dev)) {
>> arch/powerpc/kernel/iommu.c:    if (!device_iommu_mapped(dev)) {
>>
>> These should both be replaced with some kind of 'device has iommu 
>> group', since
>> it is really driving ppc unique group logic.

And in fact those appear to mostly serve for printing debug messages; if 
we made iommu_group_add_device() return -EBUSY for duplicate calls (not 
necessarily a bad idea anyway vs. relying on the noisy failure of 
sysfs_create_link()), they could arguably just go.

All in all, it's only actually the habanalabs ones that I'm slightly 
wary of, since they're effectively (mis)using device_iommu_mapped() to 
infer the DMA ops implementation, which could potentially go wrong (or 
at least *more* wrong) on POWER with this change. I guess the saving 
grace is that although they are available on PCIe-interfaced modules, 
the userspace driver stack seems to be implicitly x86_64-only - as far 
as I could tell from a quick poke around their site and documentation, 
which doesn't appear to acknowledge the concept of CPU architectures at 
all - so the chances of anyone actually trying to use the kernel drivers 
in anger on POWER seem minimal.

Thanks,
Robin.

>> So, I'd say Baolu's approach is the right thing, just replace the
>> above two in ppc with something else.
> 
> Thank you both. I will follow up a series later.
> 
> Best regards,
> baolu

Powered by blists - more mailing lists