linux-kernel - Re: [PATCH 3/3] iommu/arm-smmu-v3: Add a max_tlbi_ops for __arm_smmu_tlb_inv

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3f630d3d-c59a-f454-14db-2bf9b8e76877@arm.com>
Date:   Tue, 22 Aug 2023 10:30:35 +0100
From:   Robin Murphy <robin.murphy@....com>
To:     Nicolin Chen <nicolinc@...dia.com>, will@...nel.org, jgg@...dia.com
Cc:     joro@...tes.org, jean-philippe@...aro.org, apopple@...dia.com,
        linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        iommu@...ts.linux.dev
Subject: Re: [PATCH 3/3] iommu/arm-smmu-v3: Add a max_tlbi_ops for
 __arm_smmu_tlb_inv_range()

On 2023-08-22 09:45, Nicolin Chen wrote:
> When receiving an __arm_smmu_tlb_inv_range() call with a large size, there
> could be a long latency at this function call: one part is coming from a
> large software overhead in the routine of building commands, and the other
> part is coming from CMDQ hardware consuming the large number of commands.
> This latency could be significantly large on an SMMU that does not support
> range invalidation commands, i.e. no ARM_SMMU_FEAT_RANGE_INV.
> 
> One way to optimize this is to replace a large number of VA invalidation
> commands with one single per-asid or per-vmid invalidation command, if an
> invalidation size reaches a preset threshold using the number of entries
> per io-pgtable, similar to the MAX_TLBI_OPS in arm64's tlbflush.h.
> 
> Add a max_tlbi_ops in arm_smmu_domain, and convert a large number of per-
> granule TLBI commands to one single per-asid or per-vmid TLBI command, on
> SMMUs without ARM_SMMU_FEAT_RANGE_INV.
> 
> Signed-off-by: Nicolin Chen <nicolinc@...dia.com>
> ---
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 10 +++++++++-
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 +
>   2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index d6c647e1eb01..3f0db30932bd 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1897,7 +1897,14 @@ static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
>   	if (!size)
>   		return;
>   
> -	if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
> +	if (!(smmu->features & ARM_SMMU_FEAT_RANGE_INV)) {
> +		/*
> +		 * When the size reaches a threshold, replace per-granule TLBI
> +		 * commands with one single per-asid or per-vmid TLBI command.
> +		 */
> +		if (size >= granule * smmu_domain->max_tlbi_ops)
> +			return arm_smmu_tlb_inv_domain(smmu_domain);

This looks like it's at the wrong level - we should have figured this 
out before we got as far as low-level command-building. I'd have thought 
it would be a case of short-circuiting directly from 
arm_smmu_tlb_inv_range_domain() to arm_smmu_tlb_inv_context().

> +	} else {
>   		/* Get the leaf page size */
>   		tg = __ffs(smmu_domain->domain.pgsize_bitmap);
>   
> @@ -2258,6 +2265,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain,
>   	}
>   
>   	smmu_domain->pgtbl_ops = pgtbl_ops;
> +	smmu_domain->max_tlbi_ops = pgtbl_cfg.nents_per_pgtable;

And now we're carrying *three* copies of the same information around 
everywhere? Honestly, just pull cfg->bits_per_level out of the 
io_pgtable_ops at the point where you need it, like the pagetable code 
itself manages to do perfectly happily. Wrap it in an io-pgtable helper 
if you think that's cleaner.

Thanks,
Robin.

>   	return 0;
>   }
>   
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index dcab85698a4e..f68c95a2e24f 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -721,6 +721,7 @@ struct arm_smmu_domain {
>   	struct io_pgtable_ops		*pgtbl_ops;
>   	bool				stall_enabled;
>   	atomic_t			nr_ats_masters;
> +	unsigned long			max_tlbi_ops;
>   
>   	enum arm_smmu_domain_stage	stage;
>   	union {