lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1e016926-b965-3b71-07e1-1f63bc45f1a0@arm.com>
Date:   Mon, 30 Jan 2023 15:13:22 +0000
From:   Robin Murphy <robin.murphy@....com>
To:     Niklas Schnelle <schnelle@...ux.ibm.com>,
        Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
        Jason Gunthorpe <jgg@...dia.com>,
        Wenjia Zhang <wenjia@...ux.ibm.com>
Cc:     Matthew Rosato <mjrosato@...ux.ibm.com>,
        Gerd Bayer <gbayer@...ux.ibm.com>,
        Pierre Morel <pmorel@...ux.ibm.com>, iommu@...ts.linux.dev,
        linux-s390@...r.kernel.org, borntraeger@...ux.ibm.com,
        hca@...ux.ibm.com, gor@...ux.ibm.com,
        gerald.schaefer@...ux.ibm.com, agordeev@...ux.ibm.com,
        svens@...ux.ibm.com, linux-kernel@...r.kernel.org,
        Julian Ruess <julianr@...ux.ibm.com>
Subject: Re: [PATCH v5 5/7] iommu/dma: Allow a single FQ in addition to
 per-CPU FQs

On 2023-01-24 12:50, Niklas Schnelle wrote:
> In some virtualized environments, including s390 paged memory guests,
> IOTLB flushes are used to update IOMMU shadow tables. Due to this, they
> are much more expensive than in typical bare metal environments or
> non-paged s390 guests. In addition they may parallelize more poorly in
> virtualized environments. This changes the trade off for flushing IOVAs
> such that minimizing the number of IOTLB flushes trumps any benefit of
> cheaper queuing operations or increased paralellism.
> 
> In this scenario per-CPU flush queues pose several problems. Firstly
> per-CPU memory is often quite limited prohibiting larger queues.
> Secondly collecting IOVAs per-CPU but flushing via a global timeout
> reduces the number of IOVAs flushed for each timeout especially on s390
> where PCI interrupts may not be bound to a specific CPU.
> 
> Thus let's introduce a single flush queue mode IOMMU_DOMAIN_DMA_SQ that
> reuses the same queue logic but only allocates a single global queue
> allowing larger batches of IOVAs to be freed at once and with larger
> timeouts. This is to allow the common IOVA flushing code to more closely
> resemble the global flush behavior used on s390's previous internal DMA
> API implementation.
> 
> As we now support two different variants of flush queues rename the
> existing __IOMMU_DOMAIN_DMA_FQ to __IOMMU_DOMAIN_DMA_LAZY to indicate
> the general case of having a flush queue and introduce separate
> __IOMMU_DOMAIN_DMA_PERCPU_Q and __IOMMU_DOMAIN_DMA_SINGLE_Q bits to
> indicate the two queue variants.

Is there any actual need for the flush queue type to vary on a 
per-domain basis? All the descriptions here seem to imply that in fact 
it's always going to be a global decision one way or the other on s390, 
so if that's all we really need, we can save ourselves a bunch of 
trouble here by not having to mess with the core code at all, and just 
having some kind of switch in iommu-dma.

Either way, the more I think about this the more I'm starting to agree 
that adding more domain types for iommu-dma policy is a step in the 
wrong direction. If I may, I'd like to fall back on the "or at least 
some definite internal flag" part of my original suggestion :)

Thanks,
Robin.

> Link: https://lore.kernel.org/linux-iommu/3e402947-61f9-b7e8-1414-fde006257b6f@arm.com/
> Signed-off-by: Niklas Schnelle <schnelle@...ux.ibm.com>
> ---
> v4 -> v5:
> - Fixed iommu_group_store_type() mistakenly initializing DMA-SQ instead of
>    DMA-FQ. This was caused by iommu_dma_init_fq() being called before
>    domain->type is set, instead pass the type as paramater. This also closes
>    a window where domain->type is still DMA while the FQ is already used. (Gerd)
> v2 -> v3:
> - Rename __IOMMU_DOMAIN_DMA_FQ to __IOMMU_DOMAIN_DMA_LAZY to make it more clear
>    that this bit indicates flush queue use independent of the exact queuing
>    strategy
> 
>   drivers/iommu/dma-iommu.c  | 155 ++++++++++++++++++++++++++++---------
>   drivers/iommu/dma-iommu.h  |   4 +-
>   drivers/iommu/iommu.c      |  16 ++--
>   drivers/iommu/s390-iommu.c |   1 +
>   include/linux/iommu.h      |  14 +++-
>   5 files changed, 142 insertions(+), 48 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index f798c44e0903..d13ca6db0012 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -48,8 +48,11 @@ struct iommu_dma_cookie {
>   		/* Full allocator for IOMMU_DMA_IOVA_COOKIE */
>   		struct {
>   			struct iova_domain	iovad;
> -
> -			struct iova_fq __percpu *fq;	/* Flush queue */
> +			/* Flush queue */
> +			union {
> +				struct iova_fq	*single_fq;
> +				struct iova_fq	__percpu *percpu_fq;
> +			};
>   			/* Number of TLB flushes that have been started */
>   			atomic64_t		fq_flush_start_cnt;
>   			/* Number of TLB flushes that have been finished */
> @@ -151,25 +154,44 @@ static void fq_flush_iotlb(struct iommu_dma_cookie *cookie)
>   	atomic64_inc(&cookie->fq_flush_finish_cnt);
>   }
>   
> -static void fq_flush_timeout(struct timer_list *t)
> +static void fq_flush_percpu(struct iommu_dma_cookie *cookie)
>   {
> -	struct iommu_dma_cookie *cookie = from_timer(cookie, t, fq_timer);
>   	int cpu;
>   
> -	atomic_set(&cookie->fq_timer_on, 0);
> -	fq_flush_iotlb(cookie);
> -
>   	for_each_possible_cpu(cpu) {
>   		unsigned long flags;
>   		struct iova_fq *fq;
>   
> -		fq = per_cpu_ptr(cookie->fq, cpu);
> +		fq = per_cpu_ptr(cookie->percpu_fq, cpu);
>   		spin_lock_irqsave(&fq->lock, flags);
>   		fq_ring_free(cookie, fq);
>   		spin_unlock_irqrestore(&fq->lock, flags);
>   	}
>   }
>   
> +static void fq_flush_single(struct iommu_dma_cookie *cookie)
> +{
> +	struct iova_fq *fq = cookie->single_fq;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&fq->lock, flags);
> +	fq_ring_free(cookie, fq);
> +	spin_unlock_irqrestore(&fq->lock, flags);
> +}
> +
> +static void fq_flush_timeout(struct timer_list *t)
> +{
> +	struct iommu_dma_cookie *cookie = from_timer(cookie, t, fq_timer);
> +
> +	atomic_set(&cookie->fq_timer_on, 0);
> +	fq_flush_iotlb(cookie);
> +
> +	if (cookie->fq_domain->type == IOMMU_DOMAIN_DMA_FQ)
> +		fq_flush_percpu(cookie);
> +	else
> +		fq_flush_single(cookie);
> +}
> +
>   static void queue_iova(struct iommu_dma_cookie *cookie,
>   		unsigned long pfn, unsigned long pages,
>   		struct list_head *freelist)
> @@ -187,7 +209,11 @@ static void queue_iova(struct iommu_dma_cookie *cookie,
>   	 */
>   	smp_mb();
>   
> -	fq = raw_cpu_ptr(cookie->fq);
> +	if (cookie->fq_domain->type == IOMMU_DOMAIN_DMA_FQ)
> +		fq = raw_cpu_ptr(cookie->percpu_fq);
> +	else
> +		fq = cookie->single_fq;
> +
>   	spin_lock_irqsave(&fq->lock, flags);
>   
>   	/*
> @@ -218,31 +244,91 @@ static void queue_iova(struct iommu_dma_cookie *cookie,
>   			  jiffies + msecs_to_jiffies(IOVA_FQ_TIMEOUT));
>   }
>   
> -static void iommu_dma_free_fq(struct iommu_dma_cookie *cookie)
> +static void iommu_dma_free_fq_single(struct iova_fq *fq)
>   {
> -	int cpu, idx;
> +	int idx;
>   
> -	if (!cookie->fq)
> +	if (!fq)
>   		return;
> +	fq_ring_for_each(idx, fq)
> +		put_pages_list(&fq->entries[idx].freelist);
> +	vfree(fq);
> +}
> +
> +static void iommu_dma_free_fq_percpu(struct iova_fq __percpu *percpu_fq)
> +{
> +	int cpu, idx;
>   
> -	del_timer_sync(&cookie->fq_timer);
>   	/* The IOVAs will be torn down separately, so just free our queued pages */
>   	for_each_possible_cpu(cpu) {
> -		struct iova_fq *fq = per_cpu_ptr(cookie->fq, cpu);
> +		struct iova_fq *fq = per_cpu_ptr(percpu_fq, cpu);
>   
>   		fq_ring_for_each(idx, fq)
>   			put_pages_list(&fq->entries[idx].freelist);
>   	}
>   
> -	free_percpu(cookie->fq);
> +	free_percpu(percpu_fq);
> +}
> +
> +static void iommu_dma_free_fq(struct iommu_dma_cookie *cookie)
> +{
> +	if (!cookie->fq_domain)
> +		return;
> +
> +	del_timer_sync(&cookie->fq_timer);
> +	if (cookie->fq_domain->type == IOMMU_DOMAIN_DMA_FQ)
> +		iommu_dma_free_fq_percpu(cookie->percpu_fq);
> +	else
> +		iommu_dma_free_fq_single(cookie->single_fq);
> +}
> +
> +
> +static void iommu_dma_init_one_fq(struct iova_fq *fq)
> +{
> +	int i;
> +
> +	fq->head = 0;
> +	fq->tail = 0;
> +
> +	spin_lock_init(&fq->lock);
> +
> +	for (i = 0; i < IOVA_FQ_SIZE; i++)
> +		INIT_LIST_HEAD(&fq->entries[i].freelist);
> +}
> +
> +static int iommu_dma_init_fq_single(struct iommu_dma_cookie *cookie)
> +{
> +	struct iova_fq *queue;
> +
> +	queue = vzalloc(sizeof(*queue));
> +	if (!queue)
> +		return -ENOMEM;
> +	iommu_dma_init_one_fq(queue);
> +	cookie->single_fq = queue;
> +
> +	return 0;
> +}
> +
> +static int iommu_dma_init_fq_percpu(struct iommu_dma_cookie *cookie)
> +{
> +	struct iova_fq __percpu *queue;
> +	int cpu;
> +
> +	queue = alloc_percpu(struct iova_fq);
> +	if (!queue)
> +		return -ENOMEM;
> +
> +	for_each_possible_cpu(cpu)
> +		iommu_dma_init_one_fq(per_cpu_ptr(queue, cpu));
> +	cookie->percpu_fq = queue;
> +	return 0;
>   }
>   
>   /* sysfs updates are serialised by the mutex of the group owning @domain */
> -int iommu_dma_init_fq(struct iommu_domain *domain)
> +int iommu_dma_init_fq(struct iommu_domain *domain, int type)
>   {
>   	struct iommu_dma_cookie *cookie = domain->iova_cookie;
> -	struct iova_fq __percpu *queue;
> -	int i, cpu;
> +	int rc;
>   
>   	if (cookie->fq_domain)
>   		return 0;
> @@ -250,26 +336,19 @@ int iommu_dma_init_fq(struct iommu_domain *domain)
>   	atomic64_set(&cookie->fq_flush_start_cnt,  0);
>   	atomic64_set(&cookie->fq_flush_finish_cnt, 0);
>   
> -	queue = alloc_percpu(struct iova_fq);
> -	if (!queue) {
> -		pr_warn("iova flush queue initialization failed\n");
> -		return -ENOMEM;
> -	}
> -
> -	for_each_possible_cpu(cpu) {
> -		struct iova_fq *fq = per_cpu_ptr(queue, cpu);
> -
> -		fq->head = 0;
> -		fq->tail = 0;
> -
> -		spin_lock_init(&fq->lock);
> +	if (type == IOMMU_DOMAIN_DMA_FQ)
> +		rc = iommu_dma_init_fq_percpu(cookie);
> +	else
> +		rc = iommu_dma_init_fq_single(cookie);
>   
> -		for (i = 0; i < IOVA_FQ_SIZE; i++)
> -			INIT_LIST_HEAD(&fq->entries[i].freelist);
> +	if (rc) {
> +		pr_warn("iova flush queue initialization failed\n");
> +		/* fall back to strict mode */
> +		domain->type = IOMMU_DOMAIN_DMA;
> +		return rc;
>   	}
>   
> -	cookie->fq = queue;
> -
> +	domain->type = type;
>   	timer_setup(&cookie->fq_timer, fq_flush_timeout, 0);
>   	atomic_set(&cookie->fq_timer_on, 0);
>   	/*
> @@ -582,9 +661,9 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
>   	if (ret)
>   		goto done_unlock;
>   
> -	/* If the FQ fails we can simply fall back to strict mode */
> -	if (domain->type == IOMMU_DOMAIN_DMA_FQ && iommu_dma_init_fq(domain))
> -		domain->type = IOMMU_DOMAIN_DMA;
> +	/* If the FQ fails we fall back to strict mode */
> +	if (domain->type & __IOMMU_DOMAIN_DMA_LAZY)
> +		iommu_dma_init_fq(domain, domain->type);
>   
>   	ret = iova_reserve_iommu_regions(dev, domain);
>   
> diff --git a/drivers/iommu/dma-iommu.h b/drivers/iommu/dma-iommu.h
> index 942790009292..2e037fc6074c 100644
> --- a/drivers/iommu/dma-iommu.h
> +++ b/drivers/iommu/dma-iommu.h
> @@ -12,7 +12,7 @@
>   int iommu_get_dma_cookie(struct iommu_domain *domain);
>   void iommu_put_dma_cookie(struct iommu_domain *domain);
>   
> -int iommu_dma_init_fq(struct iommu_domain *domain);
> +int iommu_dma_init_fq(struct iommu_domain *domain, int type);
>   
>   void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list);
>   
> @@ -20,7 +20,7 @@ extern bool iommu_dma_forcedac;
>   
>   #else /* CONFIG_IOMMU_DMA */
>   
> -static inline int iommu_dma_init_fq(struct iommu_domain *domain)
> +static inline int iommu_dma_init_fq(struct iommu_domain *domain, int type)
>   {
>   	return -EINVAL;
>   }
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 5565e510f7d2..b58c4313851b 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -145,6 +145,7 @@ static const char *iommu_domain_type_str(unsigned int t)
>   		return "Unmanaged";
>   	case IOMMU_DOMAIN_DMA:
>   	case IOMMU_DOMAIN_DMA_FQ:
> +	case IOMMU_DOMAIN_DMA_SQ:
>   		return "Translated";
>   	default:
>   		return "Unknown";
> @@ -477,7 +478,7 @@ early_param("iommu.strict", iommu_dma_setup);
>   void iommu_set_dma_strict(void)
>   {
>   	iommu_dma_strict = true;
> -	if (iommu_def_domain_type == IOMMU_DOMAIN_DMA_FQ)
> +	if (!!(iommu_def_domain_type & __IOMMU_DOMAIN_DMA_LAZY))
>   		iommu_def_domain_type = IOMMU_DOMAIN_DMA;
>   }
>   
> @@ -678,6 +679,9 @@ static ssize_t iommu_group_show_type(struct iommu_group *group,
>   		case IOMMU_DOMAIN_DMA_FQ:
>   			type = "DMA-FQ\n";
>   			break;
> +		case IOMMU_DOMAIN_DMA_SQ:
> +			type = "DMA-SQ\n";
> +			break;
>   		}
>   	}
>   	mutex_unlock(&group->mutex);
> @@ -2896,10 +2900,8 @@ static int iommu_change_dev_def_domain(struct iommu_group *group,
>   	}
>   
>   	/* We can bring up a flush queue without tearing down the domain */
> -	if (type == IOMMU_DOMAIN_DMA_FQ && prev_dom->type == IOMMU_DOMAIN_DMA) {
> -		ret = iommu_dma_init_fq(prev_dom);
> -		if (!ret)
> -			prev_dom->type = IOMMU_DOMAIN_DMA_FQ;
> +	if (!!(type & __IOMMU_DOMAIN_DMA_LAZY) && prev_dom->type == IOMMU_DOMAIN_DMA) {
> +		ret = iommu_dma_init_fq(prev_dom, type);
>   		goto out;
>   	}
>   
> @@ -2970,6 +2972,8 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,
>   		req_type = IOMMU_DOMAIN_DMA;
>   	else if (sysfs_streq(buf, "DMA-FQ"))
>   		req_type = IOMMU_DOMAIN_DMA_FQ;
> +	else if (sysfs_streq(buf, "DMA-SQ"))
> +		req_type = IOMMU_DOMAIN_DMA_SQ;
>   	else if (sysfs_streq(buf, "auto"))
>   		req_type = 0;
>   	else
> @@ -3021,7 +3025,7 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,
>   
>   	/* Check if the device in the group still has a driver bound to it */
>   	device_lock(dev);
> -	if (device_is_bound(dev) && !(req_type == IOMMU_DOMAIN_DMA_FQ &&
> +	if (device_is_bound(dev) && !((req_type & __IOMMU_DOMAIN_DMA_LAZY) &&
>   	    group->default_domain->type == IOMMU_DOMAIN_DMA)) {
>   		pr_err_ratelimited("Device is still bound to driver\n");
>   		ret = -EBUSY;
> diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c
> index 73144ea0adfc..ff73b75be886 100644
> --- a/drivers/iommu/s390-iommu.c
> +++ b/drivers/iommu/s390-iommu.c
> @@ -332,6 +332,7 @@ static struct iommu_domain *s390_domain_alloc(unsigned domain_type)
>   	switch (domain_type) {
>   	case IOMMU_DOMAIN_DMA:
>   	case IOMMU_DOMAIN_DMA_FQ:
> +	case IOMMU_DOMAIN_DMA_SQ:
>   	case IOMMU_DOMAIN_UNMANAGED:
>   		break;
>   	default:
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index e7f76599f09e..74cee59516aa 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -62,10 +62,13 @@ struct iommu_domain_geometry {
>   #define __IOMMU_DOMAIN_DMA_API	(1U << 1)  /* Domain for use in DMA-API
>   					      implementation              */
>   #define __IOMMU_DOMAIN_PT	(1U << 2)  /* Domain is identity mapped   */
> -#define __IOMMU_DOMAIN_DMA_FQ	(1U << 3)  /* DMA-API uses flush queue    */
> +#define __IOMMU_DOMAIN_DMA_LAZY	(1U << 3)  /* DMA-API uses flush queue    */
>   
>   #define __IOMMU_DOMAIN_SVA	(1U << 4)  /* Shared process address space */
>   
> +#define __IOMMU_DOMAIN_DMA_PERCPU_Q	(1U << 5)  /* Per-CPU flush queue */
> +#define __IOMMU_DOMAIN_DMA_SINGLE_Q	(1U << 6)  /* Single flush queue */
> +
>   /*
>    * This are the possible domain-types
>    *
> @@ -79,6 +82,8 @@ struct iommu_domain_geometry {
>    *				  certain optimizations for these domains
>    *	IOMMU_DOMAIN_DMA_FQ	- As above, but definitely using batched TLB
>    *				  invalidation.
> + *	IOMMU_DOMAIN_DMA_SQ	- As IOMMU_DOMAIN_DMA_FQ, but batched TLB
> + *				  validations use single global queue
>    *	IOMMU_DOMAIN_SVA	- DMA addresses are shared process addresses
>    *				  represented by mm_struct's.
>    */
> @@ -89,7 +94,12 @@ struct iommu_domain_geometry {
>   				 __IOMMU_DOMAIN_DMA_API)
>   #define IOMMU_DOMAIN_DMA_FQ	(__IOMMU_DOMAIN_PAGING |	\
>   				 __IOMMU_DOMAIN_DMA_API |	\
> -				 __IOMMU_DOMAIN_DMA_FQ)
> +				 __IOMMU_DOMAIN_DMA_LAZY |	\
> +				 __IOMMU_DOMAIN_DMA_PERCPU_Q)
> +#define IOMMU_DOMAIN_DMA_SQ	(__IOMMU_DOMAIN_PAGING |	\
> +				 __IOMMU_DOMAIN_DMA_API |	\
> +				 __IOMMU_DOMAIN_DMA_LAZY |	\
> +				 __IOMMU_DOMAIN_DMA_SINGLE_Q)
>   #define IOMMU_DOMAIN_SVA	(__IOMMU_DOMAIN_SVA)
>   
>   struct iommu_domain {

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ