[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4938b20b-14d8-86f8-e80b-9d8ed9d8f28d@redhat.com>
Date: Fri, 10 Mar 2023 09:41:01 +0100
From: Eric Auger <eric.auger@...hat.com>
To: Robin Murphy <robin.murphy@....com>,
Nicolin Chen <nicolinc@...dia.com>, jgg@...dia.com,
will@...nel.org
Cc: kevin.tian@...el.com, baolu.lu@...ux.intel.com, joro@...tes.org,
shameerali.kolothum.thodi@...wei.com, jean-philippe@...aro.org,
linux-arm-kernel@...ts.infradead.org, iommu@...ts.linux.dev,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 01/14] iommu: Add iommu_get_unmanaged_domain helper
Hi,
On 3/9/23 13:51, Robin Murphy wrote:
> On 2023-03-09 10:53, Nicolin Chen wrote:
>> The nature of ITS virtualization on ARM is done via hypercalls, so
>> kernel
>> handles all IOVA mappings for the MSI doorbell in
>> iommu_dma_prepare_msi()
>> and iommu_dma_compose_msi_msg(). The current virtualization solution
>> with
>> a 2-stage nested translation setup is to do 1:1 IOVA mappings at stage-1
>> guest-level IO page table via a RMR region in guest-level IORT, aligning
>> with an IOVA region that's predefined and mapped in the host kernel:
>>
>> [stage-2 host level]
>> #define MSI_IOVA_BASE 0x8000000
>> #define MSI_IOVA_LENGTH 0x100000
>> ...
>> iommu_get_msi_cookie():
>> cookie->msi_iova = MSI_IOVA_BASE;
>> ...
>> iommu_dma_prepare_msi(its_pa):
>> domain = iommu_get_domain_for_dev(dev);
>> iommu_dma_get_msi_page(its_pa, domain):
>> cookie = domain->iova_cookie;
>> iova = iommu_dma_alloc_iova():
>> return cookie->msi_iova - size;
>> iommu_map(iova, its_pa, ...);
>>
>> [stage-1 guest level]
>> // Define in IORT a RMR [MSI_IOVA_BASE, MSI_IOVA_LENGTH]
>> ...
>> iommu_create_device_direct_mappings():
>> iommu_map(iova=MSI_IOVA_BASE, pa=MSI_IOVA_BASE,
>> len=MSI_IOVA_LENGTH);
>>
>> This solution calling iommu_get_domain_for_dev() needs the device to get
>> attached to a host-level iommu_domain that has the msi_cookie.
>>
>> On the other hand, IOMMUFD designs two iommu_domain objects to represent
>> the two stages: a stage-1 domain (IOMMU_DOMAIN_NESTED type) and a
>> stage-2
>> domain (IOMMU_DOMAIN_UNMANAGED type). In this design, the device will be
>> attached to the stage-1 domain representing a guest-level IO page table,
>> or a Context Descriptor Table in SMMU's term.
>>
>> This is obviously a mismatch, as the iommu_get_domain_for_dev() does not
>> return the correct domain pointer in iommu_dma_prepare_msi().
>>
>> Add an iommu_get_unmanaged_domain helper to allow drivers to return the
>> correct IOMMU_DOMAIN_UNMANAGED iommu_domain having the IOVA mappings for
>> the msi_cookie. Keep it in the iommu-priv header for internal use only.
>>
>> Suggested-by: Jason Gunthorpe <jgg@...dia.com>
>> Signed-off-by: Nicolin Chen <nicolinc@...dia.com>
>> ---
>> drivers/iommu/dma-iommu.c | 5 +++--
>> drivers/iommu/iommu-priv.h | 15 +++++++++++++++
>> include/linux/iommu.h | 2 ++
>> 3 files changed, 20 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
>> index 99b2646cb5c7..6b0409d0ff85 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -31,6 +31,7 @@
>> #include <linux/vmalloc.h>
>> #include "dma-iommu.h"
>> +#include "iommu-priv.h"
>> struct iommu_dma_msi_page {
>> struct list_head list;
>> @@ -1652,7 +1653,7 @@ static struct iommu_dma_msi_page
>> *iommu_dma_get_msi_page(struct device *dev,
>> int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr)
>> {
>> struct device *dev = msi_desc_to_dev(desc);
>> - struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>> + struct iommu_domain *domain = iommu_get_unmanaged_domain(dev);
>
> This still doesn't make sense - most of the time this will be expected
> to return the default DMA/identity domain if that's what the device is
> currently using. We can't know whether the current domain is managed
> or not until we look at it.
I tend to agree with Robin here. This was first introduced by
[PATCH v7 21/22] iommu/dma: Add support for mapping MSIs <https://lore.kernel.org/all/2273af20d844bd618c6a90b57e639700328ebf7f.1473695704.git.robin.murphy@arm.com/#r>
https://lore.kernel.org/all/2273af20d844bd618c6a90b57e639700328ebf7f.1473695704.git.robin.murphy@arm.com/
even before the support un VFIO use case which came later on. So using the "unmanaged" terminology sounds improper to me, at least.
Couldn't we use a parent/child terminology as used in the past in
[RFC v2] /dev/iommu uAPI proposal <https://lore.kernel.org/all/BN9PR11MB5433B1E4AE5B0480369F97178C189@BN9PR11MB5433.namprd11.prod.outlook.com/#r>
This would still hold for the former use case.
Thanks
Eric
>
> Just like every other caller of iommu_get_domain_for_dev(), what we
> want here is the current kernel-owned domain that we can inspect and
> maybe do standard IOMMU API things with. Why can't
> iommu_get_domain_for_dev() simply maintain that established usage
> model and return the kernel-owned s2_domain from a nested domain
> automatically? No IOMMU API user expects or needs it to return
> anything else (and IOMMUFD should certainly not be losing track of a
> nested domain within its own higher-level abstractions and needing to
> fall back on iommu_get_domain_for_dev()), so I really don't see a
> valid reason to overcomplicate things.
>
> Please note I stress "valid" since I'm not buying arbitrarily made-up
> conceptual purity arguments. A nested domain cannot be the "one true
> domain" that is an opaque combination of S1+S2; the IOMMU API view has
> to be more like the device is attached to both the nested domain and
> the parent stage 2 domain somewhat in parallel. Even when nesting is
> active, the S2 domain still exists as a domain in its own right, and
> still needs to be visible and operated on as such, for instance if
> memory is hotplugged in or out of the VM.
>
> TBH I'd also move the s2_domain pointer into the iommu_domain itself,
> since it's going to be a common feature for all nesting
> implementations, thus there seems little need to indirect lookups
> through the drivers at all.
>
> Thanks,
> Robin.
>
>> struct iommu_dma_msi_page *msi_page;
>> static DEFINE_MUTEX(msi_prepare_lock); /* see below */
>> @@ -1685,7 +1686,7 @@ int iommu_dma_prepare_msi(struct msi_desc
>> *desc, phys_addr_t msi_addr)
>> void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct
>> msi_msg *msg)
>> {
>> struct device *dev = msi_desc_to_dev(desc);
>> - const struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>> + const struct iommu_domain *domain =
>> iommu_get_unmanaged_domain(dev);
>> const struct iommu_dma_msi_page *msi_page;
>> msi_page = msi_desc_get_iommu_cookie(desc);
>> diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
>> index a6e694f59f64..da8044da9ad8 100644
>> --- a/drivers/iommu/iommu-priv.h
>> +++ b/drivers/iommu/iommu-priv.h
>> @@ -15,6 +15,21 @@ static inline const struct iommu_ops
>> *dev_iommu_ops(struct device *dev)
>> return dev->iommu->iommu_dev->ops;
>> }
>> +static inline struct iommu_domain
>> *iommu_get_unmanaged_domain(struct device *dev)
>> +{
>> + const struct iommu_ops *ops;
>> +
>> + if (!dev->iommu || !dev->iommu->iommu_dev)
>> + goto attached_domain;
>> +
>> + ops = dev_iommu_ops(dev);
>> + if (ops->get_unmanaged_domain)
>> + return ops->get_unmanaged_domain(dev);
>> +
>> +attached_domain:
>> + return iommu_get_domain_for_dev(dev);
>> +}
>> +
>> int iommu_group_replace_domain(struct iommu_group *group,
>> struct iommu_domain *new_domain);
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 080278c8154d..76c65cc4fc15 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -275,6 +275,8 @@ struct iommu_ops {
>> struct iommu_domain *parent,
>> const void *user_data);
>> + struct iommu_domain *(*get_unmanaged_domain)(struct device *dev);
>> +
>> struct iommu_device *(*probe_device)(struct device *dev);
>> void (*release_device)(struct device *dev);
>> void (*probe_finalize)(struct device *dev);
>
Powered by blists - more mailing lists