linux-kernel - Re: [v3 PATCH] iommu/arm-smmu-v3: Fix L1 stream table index calculation for 32-bit sid size

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20241008133458.GA10474@willie-the-truck>
Date: Tue, 8 Oct 2024 14:34:58 +0100
From: Will Deacon <will@...nel.org>
To: Yang Shi <yang@...amperecomputing.com>
Cc: jgg@...pe.ca, nicolinc@...dia.com, james.morse@....com,
	robin.murphy@....com, linux-arm-kernel@...ts.infradead.org,
	iommu@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [v3 PATCH] iommu/arm-smmu-v3: Fix L1 stream table index
 calculation for 32-bit sid size

Hi folks,

Sorry I'm late to the party, I went fishing.

On Fri, Oct 04, 2024 at 11:04:05AM -0700, Yang Shi wrote:
> The commit ce410410f1a7 ("iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()")
> calculated the last index of L1 stream table by 1 << smmu->sid_bits. 1
> is 32 bit value.
> However some platforms, for example, AmpereOne and the platforms with
> ARM MMU-700, have 32-bit stream id size.  This resulted in ouf-of-bound shift.
> The disassembly of shift is:
> 
>     ldr     w2, [x19, 828]  //, smmu_7(D)->sid_bits
>     mov     w20, 1
>     lsl     w20, w20, w2
> 
> According to ARM spec, if the registers are 32 bit, the instruction actually
> does:
>     dest = src << (shift % 32)
> 
> So it actually shifted by zero bit.
> 
> The out-of-bound shift is also undefined behavior according to C
> language standard.
> 
> This caused v6.12-rc1 failed to boot on such platforms.
> 
> UBSAN also reported:
> 
> UBSAN: shift-out-of-bounds in drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:3628:29
> shift exponent 32 is too large for 32-bit type 'int'
> 
> Using 64 bit immediate when doing shift can solve the problem.  The
> disassembly after the fix looks like:
>     ldr     w20, [x19, 828] //, smmu_7(D)->sid_bits
>     mov     x0, 1
>     lsl     x0, x0, x20
> 
> There are a couple of problematic places, extracted the shift into a helper.
> 
> Fixes: ce410410f1a7 ("iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()")
> Tested-by: James Morse <james.morse@....com>
> Reviewed-by: Jason Gunthorpe <jgg@...dia.com>
> Signed-off-by: Yang Shi <yang@...amperecomputing.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 +++++++++++-----
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  5 +++++
>  2 files changed, 16 insertions(+), 5 deletions(-)

[...]

> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 737c5b882355..9d4fc91d9258 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -3624,8 +3624,9 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
>  {
>  	u32 l1size;
>  	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> +	u64 num_sids = arm_smmu_strtab_num_sids(smmu);
>  	unsigned int last_sid_idx =
> -		arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
> +		arm_smmu_strtab_l1_idx(num_sids - 1);
>  
>  	/* Calculate the L1 size, capped to the SIDSIZE. */
>  	cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES);
> @@ -3655,20 +3656,25 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
>  
>  static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
>  {
> -	u32 size;
> +	u64 size;
>  	struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> +	u64 num_sids = arm_smmu_strtab_num_sids(smmu);
> +
> +	size = num_sids * sizeof(struct arm_smmu_ste);
> +	/* The max size for dmam_alloc_coherent() is 32-bit */
> +	if (size > SIZE_MAX)
> +		return -EINVAL;
>  
> -	size = (1 << smmu->sid_bits) * sizeof(struct arm_smmu_ste);
>  	cfg->linear.table = dmam_alloc_coherent(smmu->dev, size,
>  						&cfg->linear.ste_dma,
>  						GFP_KERNEL);
>  	if (!cfg->linear.table) {
>  		dev_err(smmu->dev,
> -			"failed to allocate linear stream table (%u bytes)\n",
> +			"failed to allocate linear stream table (%llu bytes)\n",
>  			size);
>  		return -ENOMEM;
>  	}
> -	cfg->linear.num_ents = 1 << smmu->sid_bits;
> +	cfg->linear.num_ents = num_sids;

This all looks a bit messy to me. The architecture guarantees that
2-level stream tables are supported once we hit 7-bit SIDs and, although
the driver relaxes this to > 8-bit SIDs, we'll never run into overflow
problems in the linear table code above.

So I'm inclined to take Daniel's one-liner [1] which just chucks the
'ULL' suffix into the 2-level case. Otherwise, we're in a weird
situation where the size is 64-bit for a short while until it gets
truncated anyway when we assign it to a 32-bit field.

Any objections?

Will

[1] https://lore.kernel.org/r/20241002015357.1766934-1-danielmentz@google.com