[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <64c7659b-9bd5-2239-cde9-f7b48f6877ca@arm.com>
Date: Thu, 9 Mar 2023 19:04:19 +0000
From: Robin Murphy <robin.murphy@....com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Nicolin Chen <nicolinc@...dia.com>, will@...nel.org,
eric.auger@...hat.com, kevin.tian@...el.com,
baolu.lu@...ux.intel.com, joro@...tes.org,
shameerali.kolothum.thodi@...wei.com, jean-philippe@...aro.org,
linux-arm-kernel@...ts.infradead.org, iommu@...ts.linux.dev,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
On 2023-03-09 14:19, Jason Gunthorpe wrote:
> On Thu, Mar 09, 2023 at 12:51:20PM +0000, Robin Murphy wrote:
>
>> Please note I stress "valid" since I'm not buying arbitrarily made-up
>> conceptual purity arguments. A nested domain cannot be the "one true domain"
>> that is an opaque combination of S1+S2; the IOMMU API view has to be more
>> like the device is attached to both the nested domain and the parent stage 2
>> domain somewhat in parallel.
>
> I strongly disagree with this.
>
> The design we have from the core perspective is an opaque domain that
> is a hidden combination of S1/S2 inside the driver. We do not want to
> change the basic design of the iommu core: there is only one domain
> attached to a device/group at a time.
>
> This patch should be seen as a temporary hack to allow the ARM ITS
> stuff to hobble on a little longer. We already know that iommufd use
> cases are incompatible with the design and we need to fix it. The
> fixed solution must have iommufd install the ITS pages at domain
> allocation time and so it will not need these APIs at all. This
> tempoary code should not dictate the overall design of the iommu core.
>
> If we belive exposing the S1/S2 relationships through the iommu core
> is necessary for another legitimate purpose I would like to hear about
> it. In my opinion using APIs to peek into these details is far more
> likely to be buggy and long term I prefer to block the ability to
> learn the S2 externally from iommufd completely.
>
> Thus the overall design of the iommu core APIs is not being changed.
> The core API design follows this logic with and without nesting:
> iommu_attach_device(domin);
> WARN_ON(domain != iommu_get_domain_for_dev());
That is indeed one of many conditions that are true today, but the facts
are that nothing makes that specific assertion, nothing should ever
*need* to make that specific assertion, and any driver sufficiently dumb
to not bother keeping track of its own domain and relying on
iommu_get_domain_for_dev() to retrieve it should most definitely not be
allowed anywhere near nesting.
The overall design of the iommu core APIs *is* being changed, because
the current design is also that iommu_get_domain_for_dev() always
returns the correct domain that iommu_dma_prepare_msi() needs, which
breaks with nesting. You are literally insisting on changing this core
API, to work around intentionally breaking an existing behaviour which
could easily be preserved (far less invasively), for the sake of
preserving some other theoretical behaviour that IS NOT USEFUL.
The overall design of the iommu core APIs *is* being changed, because
the core API design also follows this logic for any domain type:
domain = iommu_get_domain_for_dev();
phys = iommu_iova_to_phys(domain, iova);
//phys meaningfully represents whether iova was valid
Yes, even blocking and identity domains, because drivers ACTUALLY DO
THIS. I'm not sure there even is a single correct thing that nesting
domains could do to satisfy all the possible expectations that callers
of iommu_iova_to_phys() may have. However if the grand design says it's
not OK for iommu_get_domain_for_dev() to return what
iommu_dma_prepare_msi() needs even though nobody else should ever be
passing a nesting domain to it, then it's also not OK for
iommu_iova_to_phys() to crash or lie and return 0 when a valid
translation (by some notion) exists, even though nobody should ever pass
a nesting domain in there either.
Forgive me for getting wound up, but I'm a pragmatist and my tolerance
for ignoring reality is low.
> The hack in this patch gets its own special single-use APIs so we can
> purge them once they are not needed and do not confusingly contaminate
> the whole design. For this reason the ops call back should only be
> implemented by SMMUv3.
>
>> Even when nesting is active, the S2 domain still exists as a domain
>> in its own right, and still needs to be visible and operated on as
>> such, for instance if memory is hotplugged in or out of the VM.
>
> It exists in iommufd and iommufd will operate it. This is not a
> problem.
>
> iommufd is not using a dual attach model.
>
> The S2 is provided to the S1's domain allocation function as creation
> data. The returned S1 domain opaquely embeds the S2. The embedded S2
> cannot be changed once the S1 domain is allocated.
>
> Attaching the S1 always causes the embedded S2 to be used too - they
> are not separable so we don't have APIs talking about about
> "attaching" the S2.
Yes, that is one way of viewing things, but it's far from the only way.
The typical lifecycle will involve starting the VM with S2 alone, then
enabling nesting later - we can view that as allocating a nested domain
based on S2, then "replacing" S2 with nested, but we could equally view
it as just attaching the nested domain on top of the existing S2, like
pushing to a stack (I do agree that trying to model it as two completely
independent and orthogonal attachments would not be sensible). It's
really just semantics of how we prefer to describe things, and whether
the s2_domain pointer is passed up-front to domain_alloc or later to
attach_dev.
The opaque nested domain looks clean in some ways, but on the other hand
it also heavily obfuscates how translations may be shared between one
nested domain and other nested and non-nested domains, such that
changing mappings in one may affect others. This is a major and
potentially surprising paradigm shift away from the current notion that
all domains represent individual isolated address spaces, so abstracting
it more openly within the core API internals would be clearer about
what's really going on. And putting common concepts at common levels of
abstraction makes things simpler and is why we have core API code at all.
Frankly it's not like we don't already have various API-internal stuff
in struct iommu_domain that nobody external should ever be looking at,
but if that angst remains overwhelming then I can't imagine it being all
that much work to move the definition to iommu-priv.h - AFAICS it
shouldn't need much more than some helpers for the handful of
iommu_get_domain_for_dev() users currently inspecting the type,
pgsize_bitmap, or geometry - which would ultimately be a good deal
neater and more productive than adding yet more special-case ops that
every driver is just going to implement identically.
And to even imagine the future possibility of things like S2 pagetable
sharing with KVM, or unpinned S2 with ATS/PRI or SMMU stalls, by
tunnelling more private nesting interfaces directly between IOMMUFD and
individual drivers without some degree of common IOMMU API abstraction
in between... no. Just... no.
Thanks,
Robin.
Powered by blists - more mailing lists