[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1058a058-ea71-45a3-aee8-5c9197e5f3f0@os.amperecomputing.com>
Date: Wed, 2 Oct 2024 11:36:48 -0700
From: Yang Shi <yang@...amperecomputing.com>
To: Robin Murphy <robin.murphy@....com>, jgg@...pe.ca, nicolinc@...dia.com,
james.morse@....com, will@...nel.org
Cc: linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [v2 PATCH] iommu/arm-smmu-v3: Fix L1 stream table index
calculation for 32-bit sid size
On 10/2/24 11:21 AM, Robin Murphy wrote:
> On 2024-10-02 6:55 pm, Yang Shi wrote:
>> The commit ce410410f1a7 ("iommu/arm-smmu-v3: Add
>> arm_smmu_strtab_l1/2_idx()")
>> calculated the last index of L1 stream table by 1 << smmu->sid_bits. 1
>> is 32 bit value.
>> However some platforms, for example, AmpereOne, have 32-bit stream id
>> size.
>> This resulted in ouf-of-bound shift. The disassembly of shift is:
>>
>> ldr w2, [x19, 828] //, smmu_7(D)->sid_bits
>> mov w20, 1
>> lsl w20, w20, w2
>>
>> According to ARM spec, if the registers are 32 bit, the instruction
>> actually
>> does:
>> dest = src << (shift % 32)
>>
>> So it actually shifted by zero bit.
>>
>> This caused v6.12-rc1 failed to boot on AmpereOne and other platform
>> [1].
>
> FWIW it's going to be seen on any platform with Arm MMU-700 since that
> always advertises 32-bit StreamID support (as other SMMU
> implementations may do too).
I see. Will add this info to the commit log.
>
>> UBSAN also reported:
>>
>> UBSAN: shift-out-of-bounds in
>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:3628:29
>> shift exponent 32 is too large for 32-bit type 'int'
>
> At best, those two lines of actual UBSAN warning are the only part
> relevant to the point, the rest of the backtrace below is definitely
> not, please trim it.
OK.
>
>> CPU: 70 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-rc1 #4
>> Hardware name: ZOLLNER SUNMOONLAKE/SunMoon Lake, BIOS 00.00.
>> 2024-08-28 18:42:45 08/28/2024
>> Call trace:
>> dump_backtrace+0xdc/0x140
>> show_stack+0x20/0x40
>> dump_stack_lvl+0x60/0x80
>> dump_stack+0x18/0x28
>> ubsan_epilogue+0x10/0x48
>> __ubsan_handle_shift_out_of_bounds+0xd8/0x1a0
>> arm_smmu_init_structures+0x374/0x3c8
>> arm_smmu_device_probe+0x208/0x600
>> platform_probe+0x70/0xe8
>> really_probe+0xc8/0x3a0
>> __driver_probe_device+0x84/0x160
>> driver_probe_device+0x44/0x130
>> __driver_attach+0xcc/0x208
>> bus_for_each_dev+0x84/0x100
>> driver_attach+0x2c/0x40
>> bus_add_driver+0x158/0x290
>> driver_register+0x70/0x138
>> __platform_driver_register+0x2c/0x40
>> arm_smmu_driver_init+0x28/0x40
>> do_one_initcall+0x60/0x318
>> do_initcalls+0x198/0x1e0
>> kernel_init_freeable+0x18c/0x1e8
>> kernel_init+0x28/0x160
>> ret_from_fork+0x10/0x20
>>
>> Using 64 bit immediate when doing shift can solve the problem. The
>> disassembly after the fix looks like:
>> ldr w20, [x19, 828] //, smmu_7(D)->sid_bits
>> mov x0, 1
>> lsl x0, x0, x20
>>
>> There are a couple of problematic places, extracted the shift into a
>> helper.
>>
>> [1]
>> https://lore.kernel.org/lkml/d4b53bbb-333a-45b9-9eb0-23ddd0820a14@arm.com/
>> Fixes: ce410410f1a7 ("iommu/arm-smmu-v3: Add
>> arm_smmu_strtab_l1/2_idx()")
>> Tested-by: James Morse <james.morse@....com>
>> Signed-off-by: Yang Shi <yang@...amperecomputing.com>
>> ---
>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 +++++---
>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 5 +++++
>> 2 files changed, 10 insertions(+), 3 deletions(-)
>>
>> v2: * Extracted the shift into a helper per Jason Gunthorpe.
>> * Covered more places per Nicolin Chen and Jason Gunthorpe.
>> * Used 1ULL instead of 1UL to guarantee 64 bit per Robin Murphy.
>> * Made the subject more general since this is not AmpereOne
>> specific
>> problem per the report from James Morse.
>> * Collected t-b tag from James Morse.
>> * Added Fixes tag in commit log.
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 737c5b882355..4eafd9f04808 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -3624,8 +3624,9 @@ static int arm_smmu_init_strtab_2lvl(struct
>> arm_smmu_device *smmu)
>> {
>> u32 l1size;
>> struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
>> + unsigned int max_sid = arm_smmu_strtab_max_sid(smmu);
>> unsigned int last_sid_idx =
>> - arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1);
>> + arm_smmu_strtab_l1_idx(max_sid - 1);
>> /* Calculate the L1 size, capped to the SIDSIZE. */
>> cfg->l2.num_l1_ents = min(last_sid_idx + 1,
>> STRTAB_MAX_L1_ENTRIES);
>> @@ -3657,8 +3658,9 @@ static int arm_smmu_init_strtab_linear(struct
>> arm_smmu_device *smmu)
>> {
>> u32 size;
>> struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg;
>> + unsigned int max_sid = arm_smmu_strtab_max_sid(smmu);
>> - size = (1 << smmu->sid_bits) * sizeof(struct arm_smmu_ste);
>> + size = max_sid * sizeof(struct arm_smmu_ste);
>> cfg->linear.table = dmam_alloc_coherent(smmu->dev, size,
>> &cfg->linear.ste_dma,
>> GFP_KERNEL);
>> @@ -3668,7 +3670,7 @@ static int arm_smmu_init_strtab_linear(struct
>> arm_smmu_device *smmu)
>> size);
>> return -ENOMEM;
>> }
>> - cfg->linear.num_ents = 1 << smmu->sid_bits;
>> + cfg->linear.num_ents = max_sid;
>> arm_smmu_init_initial_stes(cfg->linear.table,
>> cfg->linear.num_ents);
>> return 0;
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> index 1e9952ca989f..f7e8465c629a 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
>> @@ -853,6 +853,11 @@ struct arm_smmu_master_domain {
>> ioasid_t ssid;
>> };
>> +static inline unsigned int arm_smmu_strtab_max_sid(struct
>> arm_smmu_device *smmu)
>
> Nit: "max_sid" implies returning the largest supported StreamID value,
> so it would be logical to either include the "- 1" in here and adjust
> the other callers, or instead call this something like "num_sids".
Will use "num_sids".
>
> Thanks,
> Robin.
>
>> +{
>> + return (1ULL << smmu->sid_bits);
>> +}
>> +
>> static inline struct arm_smmu_domain *to_smmu_domain(struct
>> iommu_domain *dom)
>> {
>> return container_of(dom, struct arm_smmu_domain, domain);
Powered by blists - more mailing lists