[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7720d275-bc52-49c3-949a-6a6a32157418@arm.com>
Date: Fri, 29 Nov 2024 18:06:50 +0000
From: Robin Murphy <robin.murphy@....com>
To: Catalin Marinas <catalin.marinas@....com>,
Yang Shi <yang@...amperecomputing.com>
Cc: Baruch Siach <baruch@...s.co.il>, will@...nel.org, ptesarik@...e.com,
hch@....de, jiangyutang@...amperecomputing.com,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] arm64: mm: fix zone_dma_limit calculation
On 2024-11-27 5:49 pm, Catalin Marinas wrote:
> + Robin
>
> On Tue, Nov 26, 2024 at 09:38:22AM -0800, Yang Shi wrote:
>> On 11/25/24 10:27 PM, Baruch Siach wrote:
>>> On Mon, Nov 25 2024, Yang Shi wrote:
>>>> The commit ba0fb44aed47 ("dma-mapping: replace zone_dma_bits by
>>>> zone_dma_limit") changed how zone_dma_limit was calculated. Now it
>>>> returns the memsize limit in IORT or device tree instead of U32_MAX if
>>>> the memsize limit is greater than U32_MAX.
>>>
>>> Can you give a concrete example of memory layout and dma-ranges that
>>> demonstrates this issue?
>>
>> Our 2 sockets system has physical memory starts at 0x0 on node 0 and
>> 0x200000000000 on node 1. The memory size limit defined in IORT is 0x30 (48
>> bits).
>>
>> The DMA zone is:
>>
>> pages free 887722
>> boost 0
>> min 229
>> low 1108
>> high 1987
>> promo 2866
>> spanned 983040
>> present 982034
>> managed 903238
>> cma 16384
>> protection: (0, 0, 124824, 0, 0)
>> start_pfn: 65536
>>
>> When allocating DMA buffer, dma_direct_optimal_gfp_mask() is called to
>> determine the proper zone constraints. If the phys_limit is less than
>> zone_dma_limit, it will use GFP_DMA. But zone_dma_limit is 0xffffffffffff on
>> v6.12 instead of 4G prior v6.12, it means all DMA buffer allocation will go
>> to DMA zone even though the devices don't require it.
>>
>> DMA zone is on node 0, so we saw excessive remote access on 2 sockets
>> system.
> [...]
>> The physical addr range for DMA zone is correct, the problem is wrong
>> zone_dma_limit. Before commit ba0fb44aed47 zone_dma_limit was 4G, after it
>> it is the whole memory even though DMA zone just covers low 4G.
>
> Thanks for the details. I agree that zone_dma_limit shouldn't be higher
> than the ZONE_DMA upper boundary, otherwise it gets confusing for
> functions like dma_direct_optimal_gfp_mask() and we may force
> allocations to a specific range unnecessarily.
Oof, indeed the original min3() logic did also result in the
32-bit-clamped value being written back to zone_dma_bits itself, so
that's another subtlety overlooked in ba0fb44aed47 *and* the subsequent
fixes reinstating that clamping (conditionally) within max_zone_phys().
> If IORT or DT indicate a large mask covering the whole RAM (i.e. no
> restrictions), in an ideal world, we should normally extend ZONE_DMA to
> the same.
That's not right, ZONE_DMA should still be relatively limited in size
(unless we really do only have a tiny amount of RAM) - just because a DT
dma-ranges property says the system interconnect can carry >32 address
bits in general doesn't mean that individual device DMA masks can't
still be 32-bit or smaller. IIRC we're still implicitly assuming that if
DT does describe an offset range into "high" RAM, it must represent a
suitable lowest common denominator for all relevant devices already, and
therefore we can get away with sizing ZONE_DMA off it blindly. I'm yet
to hear of any platform having a general offset with no significant
upper limit, but if we did want to support such a thing then we would
need to revisit the previously-discussed notion of an accurate
of_dma_get_min_cpu_address() rather than the assumption based on
memblock_start_of_DRAM().
After staring at it for long enough, I think $SUBJECT patch is actually
correct as it is. In fact I'm now wondering why the fix was put inside
max_zone_phys() in the first place, since it's clearly pointless to
clamp DMA_BIT_MASK(32) to U32_MAX in the dma32_phys_limit case...
However the commit message is perhaps not as clear as it could be -
technically we are correctly *calculating* the appropriate effective
zone_dma_limt value within the scope of zone_sizes_init(), we're just
failing to properly update the actual zone_dma_limit variable for the
benefit of other users.
Thanks,
Robin.
> One problem is ZONE_DMA32 (and GFP_DMA32) and the fact that
> ZONE_DMA sits below it. Until we hear otherwise, we assume a DMA offset
> of 0 for such 32-bit devices and therefore define ZONE_DMA32 in the
> lower 4GB if RAM starts below this limit (and an empty ZONE_DMA32 if RAM
> starts above).
>
> Another aspect to consider is that we don't always have DT or IORT
> information or some devices need a smaller limit than what's advertised
> in the firmware tables (typically 32-bit masks). This code went through
> a couple of fixes already:
>
> 833bd284a454 ("arm64: mm: fix DMA zone when dma-ranges is missing")
> 122c234ef4e1 ("arm64: mm: keep low RAM dma zone")
>
> Since our current assumption is to assume ZONE_DMA within 32-bit if RAM
> below 4GB, I'm happy to make this conditional on CONFIG_ZONE_DMA32 also
> being enabled. So, from your patch below:
>
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index d21f67d67cf5..ccdef53872a0 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -117,15 +117,6 @@ static void __init arch_reserve_crashkernel(void)
>> static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit)
>> {
>> - /**
>> - * Information we get from firmware (e.g. DT dma-ranges) describe DMA
>> - * bus constraints. Devices using DMA might have their own limitations.
>> - * Some of them rely on DMA zone in low 32-bit memory. Keep low RAM
>> - * DMA zone on platforms that have RAM there.
>> - */
>> - if (memblock_start_of_DRAM() < U32_MAX)
>> - zone_limit = min(zone_limit, U32_MAX);
>> -
>> return min(zone_limit, memblock_end_of_DRAM() - 1) + 1;
>> }
>
> This part is fine.
>
>> @@ -141,6 +132,14 @@ static void __init zone_sizes_init(void)
>> acpi_zone_dma_limit = acpi_iort_dma_get_max_cpu_address();
>> dt_zone_dma_limit = of_dma_get_max_cpu_address(NULL);
>> zone_dma_limit = min(dt_zone_dma_limit, acpi_zone_dma_limit);
>> + /*
>> + * Information we get from firmware (e.g. DT dma-ranges) describe DMA
>> + * bus constraints. Devices using DMA might have their own limitations.
>> + * Some of them rely on DMA zone in low 32-bit memory. Keep low RAM
>> + * DMA zone on platforms that have RAM there.
>> + */
>> + if (memblock_start_of_DRAM() < U32_MAX)
>> + zone_dma_limit = min(zone_dma_limit, U32_MAX);
>> arm64_dma_phys_limit = max_zone_phys(zone_dma_limit);
>> max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
>> #endif
>
> But I'd move the zone_dma_limit update further down in the
> CONFIG_ZONE_DMA32 block. I think we only need to limit it to
> dma32_phys_limit and ignore the U32_MAX check. The former is already
> capped to 32-bit. For the second hunk, something like below (untested):
>
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index d21f67d67cf5..ffaf5bd8d0a1 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -146,8 +146,10 @@ static void __init zone_sizes_init(void)
> #endif
> #ifdef CONFIG_ZONE_DMA32
> max_zone_pfns[ZONE_DMA32] = PFN_DOWN(dma32_phys_limit);
> - if (!arm64_dma_phys_limit)
> + if (!arm64_dma_phys_limit || arm64_dma_phys_limit > dma32_phys_limit) {
> arm64_dma_phys_limit = dma32_phys_limit;
> + zone_dma_limit = arm64_dma_phys_limit - 1;
> + }
> #endif
> if (!arm64_dma_phys_limit)
> arm64_dma_phys_limit = PHYS_MASK + 1;
>
> With some comment on why we do this but most likely not the current
> comment in max_zone_phys() (more like keep ZONE_DMA below ZONE_DMA32).
>
Powered by blists - more mailing lists