[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6c4c3a3e-1d8d-7994-3c03-388ef63dddb3@arm.com>
Date: Mon, 28 Nov 2022 21:01:43 +0000
From: Robin Murphy <robin.murphy@....com>
To: Jason Gunthorpe <jgg@...dia.com>,
Niklas Schnelle <schnelle@...ux.ibm.com>
Cc: Baolu Lu <baolu.lu@...ux.intel.com>,
Matthew Rosato <mjrosato@...ux.ibm.com>,
Gerd Bayer <gbayer@...ux.ibm.com>, iommu@...ts.linux.dev,
Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
Wenjia Zhang <wenjia@...ux.ibm.com>,
Pierre Morel <pmorel@...ux.ibm.com>,
linux-s390@...r.kernel.org, borntraeger@...ux.ibm.com,
hca@...ux.ibm.com, gor@...ux.ibm.com,
gerald.schaefer@...ux.ibm.com, agordeev@...ux.ibm.com,
svens@...ux.ibm.com, linux-kernel@...r.kernel.org,
Julian Ruess <julianr@...ux.ibm.com>
Subject: Re: [PATCH v2 4/7] iommu: Let iommu.strict override
ops->def_domain_type
On 2022-11-28 16:35, Jason Gunthorpe wrote:
> On Mon, Nov 28, 2022 at 04:54:03PM +0100, Niklas Schnelle wrote:
>
>> I agree that there is currently a lack of distinction between what
>> domain types can be used (capability) and which should be used as
>> default (iommu.strict=<x>, iommu_set_...(), CONFIG_IOMMU_DEFAULT_DMA,
>> ops->def_domain_type.).
>
> What I would like to get to is the drivers only expose UNMANAGED,
> IDENTITY and BLOCKING domains. Everything that the DMA/FQ/etc domains
> were doing should be handled as some kind of cap.
>
> Eg, after Lu's series:
>
> https://lore.kernel.org/linux-iommu/20221128064648.1934720-1-baolu.lu@linux.intel.com/
>
> We should be able to remove IOMMU_DOMAIN_DMA and its related from the
> drivers entirely. Instead drivers will provide UNMANAGED domains and
> dma-iommu.c will operate the UNMANAGED domain to provide the dma
> api. We can detect if the driver supports this by set_platform_dma()
> being NULL.
>
> A statement that a driver performs better using SQ/FQ/none should be
> something that is queried from the UNMANAGED domain as a guidance to
> dma-iommu.c what configuration to pick if not overriden.
Ack, I'm sure it could be cleaner overall if the driver capabilities
didn't come in right at the end of the process with the .domain_alloc
dance. As I've said before, I would still like to keep the domain types
in the core code (since they already work as a set of capability flags),
but drivers not having to deal with them directly would be good. Maybe
we dedicate .domain_alloc to paging domains, and have separate device
ops for .get_{blocking,identity}_domain, given that optimised
implementations of those are likely to be static or at least per-instance.
> So, I would say what you want is some option flag, perhaps on the
> domain ops: 'domain performs better with SQ or FQ'
Although for something that's likely to be global based on whether
running virtualised or not, I'd be inclined to try pulling that as far
as reasonably possible towards core code.
>> My case though is about the latter which I think has some undocumented
>> and surprising precedences built in at the moment. With this series we
>> can use all of IOMMU_DOMAIN_DMA(_FQ, _SQ) on any PCI device but we want
>> to default to either IOMMU_DOMAIN_DMA_FQ or IOMMU_DOMAIN_SQ based on
>> whether we're running in a paging hypervisor (z/VM or KVM) to get the
>> best performance. From a semantic point of view I felt that this is a
>> good match for ops->def_domain_type in that we pick a default but it's
>> still possible to change the domain type e.g. via sysfs. Now this had
>> the problem that ops->def_domain_type would cause IOMMU_DOMAIN_DMA_FQ
>> to be used even if iommu_set_dma_strict() was called (via
>> iommu.strict=1) because there is a undocumented override of ops-
>>> def_domain_type over iommu_def_domain_type which I believe comes from
>> the mixing of capability and default you also point at.
>
> Yeah, this does sounds troubled.
The initial assumption about .def_domain_type is incorrect, though. From
there it's a straightforward path to the conclusion that introducing
inconsistency (by using the wrong mechanism) leads to the presence of
inconsistency.
>> I think ideally we need two separate mechanism to determine which
>> domain types can be used for a particular device (capability) and for
>> which one to default to with the latter part having a clear precedence
>> between the options. Put together I think iommu.strict=1 should
>> override a device's preference (ops->def_domain_type) and
>> CONFIG_IOMMU_DEFAULT_DMA to use IOMMU_DOMAIN_DMA but of course only if
>> the device is capable of that. Does that sound reasonable?
>
> Using the language above, if someone asks for strict then the
> infrastructure should try to allocate an UNMANAGED domain, not an
> identity domain,
Careful, "iommu.strict" refers specifically to the invalidation policy
for DMA API domains, and I've tried to be careful to document it as
such. It has never been defined to have any impact on anything other
than DMA API domains, so I don't think any should be assumed. Control of
the basic domain type (identity vs. translation) on the command line has
always been via separate parameters, which I think have always had
higher priority anyway. With sysfs you can ask for anything, but you'll
still only get it if it's safe and guaranteed to work.
> and not use the lazy flush algorithms in dma-iommu.c
>
> Similarly if sysfs asks for lazy flush or identity maps then it should
> get it, always.
I'm hardly an advocate for trying to save users from themselves, but I
honestly can't see any justifiable reason for not having sysfs respect
iommu_get_def_domain_type(). If a privileged user wants to screw up the
system they're hardly short of options already. Far worse, though, is
that someone nefarious would only need to popularise a "make external
dGPUs and/or other popular accelerators faster on laptops" udev rule
that forces identity domains via sysfs, and bye bye Thunderclap mitigations.
> The driver should have no say in how dma-iommu.c works beyond if it
> provides the required ops functionalities, and hint(s) as to what
> gives best performance.
That should already be the case today, as outlined in my other mail.
It's just somewhat more evolved than designed, so may not be so clear to
everyone.
Thanks,
Robin.
Powered by blists - more mailing lists