[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241008133458.GA10474@willie-the-truck>
Date: Tue, 8 Oct 2024 14:34:58 +0100
From: Will Deacon <will@...nel.org>
To: Yang Shi <yang@...amperecomputing.com>
Cc: jgg@...pe.ca, nicolinc@...dia.com, james.morse@....com,
robin.murphy@....com, linux-arm-kernel@...ts.infradead.org,
iommu@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [v3 PATCH] iommu/arm-smmu-v3: Fix L1 stream table index
calculation for 32-bit sid size
Hi folks,
Sorry I'm late to the party, I went fishing.
On Fri, Oct 04, 2024 at 11:04:05AM -0700, Yang Shi wrote:
> The commit ce410410f1a7 ("iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()")
> calculated the last index of L1 stream table by 1 << smmu->sid_bits. 1
> is 32 bit value.
> However some platforms, for example, AmpereOne and the platforms with
> ARM MMU-700, have 32-bit stream id size. This resulted in ouf-of-bound shift.
> The disassembly of shift is:
>
> ldr w2, [x19, 828] //, smmu_7(D)->sid_bits
> mov w20, 1
> lsl w20, w20, w2
>
> According to ARM spec, if the registers are 32 bit, the instruction actually
> does:
> dest = src << (shift % 32)
>
> So it actually shifted by zero bit.
>
> The out-of-bound shift is also undefined behavior according to C
> language standard.
>
> This caused v6.12-rc1 failed to boot on such platforms.
>
> UBSAN also reported:
>
> UBSAN: shift-out-of-bounds in drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:3628:29
> shift exponent 32 is too large for 32-bit type 'int'
>
> Using 64 bit immediate when doing shift can solve the problem. The
> disassembly after the fix looks like:
> ldr w20, [x19, 828] //, smmu_7(D)->sid_bits
> mov x0, 1
> lsl x0, x0, x20
>
> There are a couple of problematic places, extracted the shift into a helper.
>
> Fixes: ce410410f1a7 ("iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()")
> Tested-by: James Morse <james.morse@....com>
> Reviewed-by: Jason Gunthorpe <jgg@...dia.com>
> Signed-off-by: Yang Shi <yang@...amperecomputing.com>
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 +++++++++++-----
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 5 +++++
> 2 files changed, 16 insertions(+), 5 deletions(-)
[...]
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 737c5b882355..9d4fc91d9258 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -3624,8 +3624,9 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
> {
> u32 l1size;
> struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> + u64 num_sids = arm_smmu_strtab_num_sids(smmu);
> unsigned int last_sid_idx =
> - arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
> + arm_smmu_strtab_l1_idx(num_sids - 1);
>
> /* Calculate the L1 size, capped to the SIDSIZE. */
> cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES);
> @@ -3655,20 +3656,25 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu)
>
> static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
> {
> - u32 size;
> + u64 size;
> struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
> + u64 num_sids = arm_smmu_strtab_num_sids(smmu);
> +
> + size = num_sids * sizeof(struct arm_smmu_ste);
> + /* The max size for dmam_alloc_coherent() is 32-bit */
> + if (size > SIZE_MAX)
> + return -EINVAL;
>
> - size = (1 << smmu->sid_bits) * sizeof(struct arm_smmu_ste);
> cfg->linear.table = dmam_alloc_coherent(smmu->dev, size,
> &cfg->linear.ste_dma,
> GFP_KERNEL);
> if (!cfg->linear.table) {
> dev_err(smmu->dev,
> - "failed to allocate linear stream table (%u bytes)\n",
> + "failed to allocate linear stream table (%llu bytes)\n",
> size);
> return -ENOMEM;
> }
> - cfg->linear.num_ents = 1 << smmu->sid_bits;
> + cfg->linear.num_ents = num_sids;
This all looks a bit messy to me. The architecture guarantees that
2-level stream tables are supported once we hit 7-bit SIDs and, although
the driver relaxes this to > 8-bit SIDs, we'll never run into overflow
problems in the linear table code above.
So I'm inclined to take Daniel's one-liner [1] which just chucks the
'ULL' suffix into the 2-level case. Otherwise, we're in a weird
situation where the size is 64-bit for a short while until it gets
truncated anyway when we assign it to a 32-bit field.
Any objections?
Will
[1] https://lore.kernel.org/r/20241002015357.1766934-1-danielmentz@google.com
Powered by blists - more mailing lists