linux-kernel - Re: [RFC PATCH v1 0/6] Resolve unwanted DMA backing with IOMMU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <afecce46-96d2-9688-30b4-0a3f17a651d3@arm.com>
Date:   Thu, 16 Aug 2018 18:23:02 +0100
From:   Robin Murphy <robin.murphy@....com>
To:     Dmitry Osipenko <digetx@...il.com>, Joerg Roedel <joro@...tes.org>
Cc:     Jordan Crouse <jcrouse@...eaurora.org>,
        Will Deacon <will.deacon@....com>,
        Mikko Perttunen <cyndis@...si.fi>,
        Thierry Reding <thierry.reding@...il.com>,
        devicetree@...r.kernel.org, nouveau@...ts.freedesktop.org,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Nicolas Chauvet <kwizart@...il.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Russell King <linux@...linux.org.uk>,
        dri-devel@...ts.freedesktop.org,
        Jonathan Hunter <jonathanh@...dia.com>,
        iommu@...ts.linux-foundation.org, Rob Herring <robh+dt@...nel.org>,
        Ben Skeggs <bskeggs@...hat.com>,
        Catalin Marinas <catalin.marinas@....com>,
        linux-tegra@...r.kernel.org, Frank Rowand <frowand.list@...il.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v1 0/6] Resolve unwanted DMA backing with IOMMU

On 15/08/18 20:56, Dmitry Osipenko wrote:
> On Friday, 3 August 2018 18:43:41 MSK Robin Murphy wrote:
>> On 02/08/18 19:24, Dmitry Osipenko wrote:
>>> On Friday, 27 July 2018 20:16:53 MSK Dmitry Osipenko wrote:
>>>> On Friday, 27 July 2018 20:03:26 MSK Jordan Crouse wrote:
>>>>> On Fri, Jul 27, 2018 at 05:02:37PM +0100, Robin Murphy wrote:
>>>>>> On 27/07/18 15:10, Dmitry Osipenko wrote:
>>>>>>> On Friday, 27 July 2018 12:03:28 MSK Will Deacon wrote:
>>>>>>>> On Fri, Jul 27, 2018 at 10:25:13AM +0200, Joerg Roedel wrote:
>>>>>>>>> On Fri, Jul 27, 2018 at 02:16:18AM +0300, Dmitry Osipenko wrote:
>>>>>>>>>> The proposed solution adds a new option to the base device driver
>>>>>>>>>> structure that allows device drivers to explicitly convey to the
>>>>>>>>>> drivers
>>>>>>>>>> core that the implicit IOMMU backing for devices must not happen.
>>>>>>>>>
>>>>>>>>> Why is IOMMU mapping a problem for the Tegra GPU driver?
>>>>>>>>>
>>>>>>>>> If we add something like this then it should not be the choice of
>>>>>>>>> the
>>>>>>>>> device driver, but of the user and/or the firmware.
>>>>>>>>
>>>>>>>> Agreed, and it would still need somebody to configure an identity
>>>>>>>> domain
>>>>>>>> so
>>>>>>>> that transactions aren't aborted immediately. We currently allow the
>>>>>>>> identity domain to be used by default via a command-line option, so I
>>>>>>>> guess
>>>>>>>> we'd need a way for firmware to request that on a per-device basis.
>>>>>>>
>>>>>>> The IOMMU mapping itself is not a problem, the problem is the
>>>>>>> management
>>>>>>> of
>>>>>>> the IOMMU. For Tegra we don't want anything to intrude into the IOMMU
>>>>>>> activities because:
>>>>>>>
>>>>>>> 1) GPU HW require additional configuration for the IOMMU usage and
>>>>>>> dumb
>>>>>>> mapping of the allocations simply doesn't work.
>>>>>>
>>>>>> Generally, that's already handled by the DRM drivers allocating
>>>>>> their own unmanaged domains. The only problem we really need to
>>>>>> solve in that regard is that currently the device DMA ops don't get
>>>>>> updated when moving away from the managed domain. That's been OK for
>>>>>> the VFIO case where the device is bound to a different driver which
>>>>>> we know won't make any explicit DMA API calls, but for the more
>>>>>> general case of IOMMU-aware drivers we could certainly do with a bit
>>>>>> of cooperation between the IOMMU API, DMA API, and arch code to
>>>>>> update the DMA ops dynamically to cope with intermediate subsystems
>>>>>> making DMA API calls on behalf of devices they don't know the
>>>>>> intimate details of.
>>>>>>
>>>>>>> 2) Older Tegra generations have a limited resource and capabilities in
>>>>>>> regards to IOMMU usage, allocating IOMMU domain per-device is just
>>>>>>> impossible for example.
>>>>>>>
>>>>>>> 3) HW performs context switches and so particular allocations have to
>>>>>>> be
>>>>>>> assigned to a particular contexts IOMMU domain.
>>>>>>
>>>>>> I understand Qualcomm SoCs have a similar thing too, and AFAICS that
>>>>>> case just doesn't fit into the current API model at all. We need the
>>>>>> IOMMU driver to somehow know about the specific details of which
>>>>>> devices have magic associations with specific contexts, and we
>>>>>> almost certainly need a more expressive interface than
>>>>>> iommu_domain_alloc() to have any hope of reliable results.
>>>>>
>>>>> This is correct for Qualcomm GPUs - The GPU hardware context switching
>>>>> requires a specific context and there are some restrictions around
>>>>> secure contexts as well.
>>>>>
>>>>> We don't really care if the DMA attaches to a context just as long as it
>>>>> doesn't attach to the one(s) we care about. Perhaps a "valid context"
>>>>> mask
>>>>> would work in from the DT or the device struct to give the subsystems a
>>>>> clue as to which domains they were allowed to use. I recognize that
>>>>> there
>>>>> isn't a one-size-fits-all solution to this problem so I'm open to
>>>>> different
>>>>> ideas.
>>>>
>>>> Designating whether implicit IOMMU backing is appropriate for a device
>>>> via
>>>> device-tree property sounds a bit awkward because that will be a kinda
>>>> software description (of a custom Linux driver model), while device-tree
>>>> is
>>>> supposed to describe HW.
>>>>
>>>> What about to grant IOMMU drivers with ability to decide whether the
>>>> implicit backing for a device is appropriate? Like this:
>>>>
>>>> bool implicit_iommu_for_dma_is_allowed(struct device *dev)
>>>> {
>>>>
>>>> 	const struct iommu_ops *ops = dev->bus->iommu_ops;
>>>> 	struct iommu_group *group;
>>>> 	
>>>> 	group = iommu_group_get(dev);
>>>> 	if (!group)
>>>> 	
>>>> 		return NULL;
>>>> 	
>>>> 	iommu_group_put(group);
>>>> 	
>>>> 	if (!ops->implicit_iommu_for_dma_is_allowed)
>>>> 	
>>>> 		return true;
>>>> 	
>>>> 	return ops->implicit_iommu_for_dma_is_allowed(dev);
>>>>
>>>> }
>>>>
>>>> Then arch_setup_dma_ops() could have a clue whether implicit IOMMU
>>>> backing
>>>> for a device is appropriate.
>>>
>>> Guys, does it sound good to you or maybe you have something else on your
>>> mind? Even if it's not an ideal solution, it fixes the immediate problem
>>> and should be good enough for the starter.
>>
>> To me that looks like a step ion the wrong direction that won't help at
>> all in actually addressing the underlying issues.
>>
>> If the GPU driver wants to explicitly control IOMMU mappings instead of
>> relying on the IOMMU_DOMAIN_DMA abstraction, then it should use its own
>> unmanaged domain. At that point it shouldn't matter if a DMA ops domain
>> was allocated, since the GPU device will no longer be attached to it.
> 
> It is not obvious to me what solution you are proposing..
> 
> Are you saying that the detaching from the DMA IOMMU domain that is provided
> by dma_ops() implementer (ARM32 arch for example) should be generalized and
> hence there should be something like:
> 
> 	dma_detach_device_from_iommu_dma_domain(dev);
> 
> that drivers will have to invoke.

No, I mean that drivers should not have to care at all. If the device 
has been given a set of DMA ops which rely on it being attached to a 
default DMA domain, that's not the driver's fault and it's not something 
the driver should have deal with. Either the DMA ops themselves should 
be robust and provide a non-IOMMU fallback if they detect that the 
device is currently attached to a different domain, or the attach 
operation (ideally in the IOMMU core, but at worst in the IOMMU driver's 
.attach_dev callback) should automatically tell the arch code to update 
the device's DMA ops appropriately for the target domain. There are 
already examples of both approaches dotted around arch-specific code, so 
the question is which particular solution is most appropriate to 
standardise on in what is intended to be generic code.

> And hence there will be dma_map_ops.iommu_detach_device() that dma_ops()
> provider will have to implement. Thereby provider will detach device from DMA
> domain, destroy the domain and update the DMA ops of the device.
> 
>> Yes, there may be some improvements to make like having unused domains
>> not consume hardware contexts, but that's internal to the relevant IOMMU
>> drivers. If moving in and out of DMA ops domains leaves the actual
>> dma_ops broken, that's already a problem between the IOMMU API and the
>> arch DMA code as I've mentioned before.
>>
>> Furthermore, given what the example above is trying to do,
>> arch_setup_dma_ops() is way too late to do it - the default domain was
>> already set up in iommu_group_get_for_dev() when the IOMMU driver first
>> saw that device. An "opt-out" mechanism that doesn't actually opt out
>> and just bodges around being opted-in after the fact doesn't strike me
>> as something which can grow to be robust and maintainable.
>>
>> For the case where  a device has some special hardware relationship with
>> a particular IOMMU context, the IOMMU driver *has* to be completely
>> aware of that, i.e. it needs to be described in DT/ACPI, either via some
>> explicit binding or at least inferred from some SoC/instance-specific
>> IOMMU compatible. Then the IOMMU driver needs to know when the driver
>> for that device is requesting its special domain so that it provide the
>> correct context (and *not* allocate that context for other uses).
>> Anything which just relies on the order in which things currently happen
>> to be allocated is far too fragile long-term.
> 
> If hardware has some restrictions, then that should be reflected in the
> hardware description. But that's not what we are trying to solve, at least
> there is no such problem right now for NVIDIA Tegra.

OK, maybe I misunderstood "HW performs context switches and so 
particular allocations have to be assigned to a particular contexts 
IOMMU domain." - is it that the domain can be backed by any hardware 
context and the Tegra GPU driver only needs to know *which* one, rather 
then needing a specific hard-wired context to be allocated as in the 
Qcom case?

Robin.