linux-kernel - Re: [PATCH v2] iommu/iova: Optimise attempts to allocate iova from 32bit address range

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <becdc45b-860d-6a9d-6c4d-df7d129bbe96@arm.com>
Date:   Mon, 13 Aug 2018 14:56:07 +0100
From:   Robin Murphy <robin.murphy@....com>
To:     Ganapatrao Kulkarni <ganapatrao.kulkarni@...ium.com>,
        joro@...tes.org, iommu@...ts.linux-foundation.org,
        linux-kernel@...r.kernel.org
Cc:     tomasz.nowicki@...ium.com, jnair@...iumnetworks.com,
        Robert.Richter@...ium.com, Vadim.Lomovtsev@...ium.com,
        Jan.Glauber@...ium.com, gklkml16@...il.com
Subject: Re: [PATCH v2] iommu/iova: Optimise attempts to allocate iova from
 32bit address range

On 13/08/18 09:00, Ganapatrao Kulkarni wrote:
> As an optimisation for PCI devices, there is always first attempt
> been made to allocate iova from SAC address range. This will lead
> to unnecessary attempts, when there are no free ranges
> available. Adding fix to track recently failed iova address size and
> allow further attempts, only if requested size is lesser than a failed
> size. The size is updated when any replenish happens.
> 
> Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@...ium.com>
> ---
> 
> v2: update with comments [2] from Robin Murphy <robin.murphy@....com>
> 
> [2] https://lkml.org/lkml/2018/8/7/166
> 
> v1: Based on comments from Robin Murphy <robin.murphy@....com>
> for patch [1]
> 
> [1] https://lkml.org/lkml/2018/4/19/780
> 
> 
>   drivers/iommu/iova.c | 22 +++++++++++++++-------
>   include/linux/iova.h |  1 +
>   2 files changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index 83fe262..543ac79 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -56,6 +56,7 @@ init_iova_domain(struct iova_domain *iovad, unsigned long granule,
>   	iovad->granule = granule;
>   	iovad->start_pfn = start_pfn;
>   	iovad->dma_32bit_pfn = 1UL << (32 - iova_shift(iovad));
> +	iovad->max32_alloc_size = iovad->dma_32bit_pfn;
>   	iovad->flush_cb = NULL;
>   	iovad->fq = NULL;
>   	iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR;
> @@ -139,8 +140,10 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
>   
>   	cached_iova = rb_entry(iovad->cached32_node, struct iova, node);
>   	if (free->pfn_hi < iovad->dma_32bit_pfn &&
> -	    free->pfn_lo >= cached_iova->pfn_lo)
> +	    free->pfn_lo >= cached_iova->pfn_lo) {
>   		iovad->cached32_node = rb_next(&free->node);
> +		iovad->max32_alloc_size += (free->pfn_hi - free->pfn_lo);

pfn_hi is inclusive, so I don't think this is actually working as 
intended - if a full space is being freed one page at a time, this will 
never move the limit at all (because it's adding 0).

As I mentioned before, though, I'm really not convinced that it's worth 
trying to be even this clever here - we don't know that the IOVA we're 
freeing is contiguous with other free space, so the only benefit of 
doing this calculation instead of simply resetting the limit to max 
(i.e. dma_32bit_pfn) is that a subsequent allocation larger than 
(max_32_alloc_size + iova_size(free)) pages will still fail early 
instead of late. My gut feeling is that that case will be rare enough 
that it won't make a noticeable difference to realistic workloads, so we 
may as well stick with the simplest possible "almost boolean" approach 
and not bother with a calculation at all.

> +	}
>   
>   	cached_iova = rb_entry(iovad->cached_node, struct iova, node);
>   	if (free->pfn_lo >= cached_iova->pfn_lo)
> @@ -190,6 +193,10 @@ static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
>   
>   	/* Walk the tree backwards */
>   	spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
> +	if (limit_pfn <= iovad->dma_32bit_pfn &&
> +			size >= iovad->max32_alloc_size)
> +		goto iova32_full;
> +
>   	curr = __get_cached_rbnode(iovad, limit_pfn);
>   	curr_iova = rb_entry(curr, struct iova, node);
>   	do {
> @@ -200,10 +207,8 @@ static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
>   		curr_iova = rb_entry(curr, struct iova, node);
>   	} while (curr && new_pfn <= curr_iova->pfn_hi);
>   
> -	if (limit_pfn < size || new_pfn < iovad->start_pfn) {
> -		spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
> -		return -ENOMEM;
> -	}
> +	if (limit_pfn < size || new_pfn < iovad->start_pfn)
> +		goto iova32_full;
>   
>   	/* pfn_lo will point to size aligned address if size_aligned is set */
>   	new->pfn_lo = new_pfn;
> @@ -214,9 +219,12 @@ static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
>   	__cached_rbnode_insert_update(iovad, new);
>   
>   	spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
> -
> -
>   	return 0;
> +
> +iova32_full:
> +	iovad->max32_alloc_size = size;
> +	spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
> +	return -ENOMEM;
>   }
>   
>   static struct kmem_cache *iova_cache;
> diff --git a/include/linux/iova.h b/include/linux/iova.h
> index 928442d..66dff73 100644
> --- a/include/linux/iova.h
> +++ b/include/linux/iova.h
> @@ -75,6 +75,7 @@ struct iova_domain {
>   	unsigned long	granule;	/* pfn granularity for this domain */
>   	unsigned long	start_pfn;	/* Lower limit for this domain */
>   	unsigned long	dma_32bit_pfn;
> +	unsigned long	max32_alloc_size;

This probably still warrants a brief comment to help document the exact 
meaning, maybe something like "/* Size of last failed allocation */"?

For a while I've had the feeling that it might be possible to do 
something clever with an augmented rbtree to fundamentally optimise the 
search for a free area, but for now I reckon that - modulo those last 
couple of comments - this is a good enough solution for the current problem.

Thanks,
Robin.

>   	struct iova	anchor;		/* rbtree lookup anchor */
>   	struct iova_rcache rcaches[IOVA_RANGE_CACHE_MAX_SIZE];	/* IOVA range caches */
>   
>