[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <25ecbf39-e5dc-496c-be3c-8b25eeae2414@os.amperecomputing.com>
Date: Thu, 26 Jun 2025 15:39:14 -0700
From: Yang Shi <yang@...amperecomputing.com>
To: Ryan Roberts <ryan.roberts@....com>, will@...nel.org,
catalin.marinas@....com, Miko.Lenczewski@....com, dev.jain@....com,
scott@...amperecomputing.com, cl@...two.org
Cc: linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/4] arm64: mm: support large block mapping when
rodata=full
On 6/23/25 6:26 AM, Ryan Roberts wrote:
> [...]
>
>>> +
>>> +int split_leaf_mapping(unsigned long addr)
>> Thanks for coming up with the code. It does help to understand your idea. Now I
>> see why you suggested "split_mapping(start); split_mapping(end);" model. It does
>> make the implementation easier because we don't need a loop anymore. But this
>> may have a couple of problems:
>> 1. We need walk the page table twice instead of once. It sounds expensive.
> Yes we need to walk twice. That may be more expensive or less expensive,
> depending on the size of the range that you are splitting. If the range is large
> then your approach loops through every leaf mapping between the start and end
> which will be more expensive than just doing 2 walks. If the range is small then
> your approach can avoid the second walk, but at the expense of all the extra
> loop overhead.
Thinking about this further. Although there is some extra loop overhead,
but there should be not extra loads. We can check whether the start and
end are properly aligned or not, it they are aligned, we just continue
the loop without loading page table entry.
And we can optimize the loop by advancing multiple PUD/PMD/CONT size at
a time instead of one at a time. The pseudo code (for example, pmd
level) looks like:
do {
next = pmd_addr_end(start, end);
if (next < end)
nr = ((end - next) / PMD_SIZE) + 1;
if (((start | next) & ~PMD_MASK) == 0)
continue;
split_pmd(start, next);
} while (pmdp += nr, start = next * nr, start != end)
For repainting case, we just need do:
do {
nr = 1;
next = pmd_addr_end(start, end);
if (next < end && !repainting)
nr = ((end - next) / PMD_SIZE) + 1;
if (((start | next) & ~PMD_MASK) == 0 && !repainting)
continue;
split_pmd(start, next);
} while (pmdp += nr, start = next * nr, start != end)
This should reduce loop overhead and duplicate code for repainting.
Thanks,
Yang
>
> My suggestion requires 5 loads (assuming the maximum of 5 levels of lookup).
> Personally I think this is probably acceptable? Perhaps we need some other
> voices here.
>
>
>> 2. How should we handle repainting? We need split all the page tables all the
>> way down to PTE for repainting between start and end rather than keeping block
>> mappings. This model doesn't work, right? For example, repaint a 2G block. The
>> first 1G is mapped by a PUD, the second 1G is mapped by 511 PMD and 512 PTEs.
>> split_mapping(start) will split the first 1G, but split_mapping(end) will do
>> nothing, the 511 PMDs are kept intact. In addition, I think we also prefer reuse
>> the split primitive for repainting instead of inventing another one.
> I agree my approach doesn't work for the repainting case. But I think what I'm
> trying to say is that the 2 things are different operations;
> split_leaf_mapping() is just trying to ensure that the start and end of a ragion
> are on leaf boundaries. Repainting is trying to ensure that all leaf mappings
> within a range are PTE-size. I've implemented the former and you've implemented
> that latter. Your implementation looks like meets the former's requirements
> because you are only testing it for the case where the range is 1 page. But
> actually it is splitting everything in the range to PTEs.
>
> Thanks,
> Ryan
>
>> Thanks,
>> Yang
>>
>>> +{
>>> + pgd_t *pgdp, pgd;
>>> + p4d_t *p4dp, p4d;
>>> + pud_t *pudp, pud;
>>> + pmd_t *pmdp, pmd;
>>> + pte_t *ptep, pte;
>>> + int ret = 0;
>>> +
>>> + /*
>>> + * !BBML2_NOABORT systems should not be trying to change permissions on
>>> + * anything that is not pte-mapped in the first place. Just return early
>>> + * and let the permission change code raise a warning if not already
>>> + * pte-mapped.
>>> + */
>>> + if (!system_supports_bbml2_noabort())
>>> + return 0;
>>> +
>>> + /*
>>> + * Ensure addr is at least page-aligned since this is the finest
>>> + * granularity we can split to.
>>> + */
>>> + if (addr != PAGE_ALIGN(addr))
>>> + return -EINVAL;
>>> +
>>> + arch_enter_lazy_mmu_mode();
>>> +
>>> + /*
>>> + * PGD: If addr is PGD aligned then addr already describes a leaf
>>> + * boundary. If not present then there is nothing to split.
>>> + */
>>> + if (ALIGN_DOWN(addr, PGDIR_SIZE) == addr)
>>> + goto out;
>>> + pgdp = pgd_offset_k(addr);
>>> + pgd = pgdp_get(pgdp);
>>> + if (!pgd_present(pgd))
>>> + goto out;
>>> +
>>> + /*
>>> + * P4D: If addr is P4D aligned then addr already describes a leaf
>>> + * boundary. If not present then there is nothing to split.
>>> + */
>>> + if (ALIGN_DOWN(addr, P4D_SIZE) == addr)
>>> + goto out;
>>> + p4dp = p4d_offset(pgdp, addr);
>>> + p4d = p4dp_get(p4dp);
>>> + if (!p4d_present(p4d))
>>> + goto out;
>>> +
>>> + /*
>>> + * PUD: If addr is PUD aligned then addr already describes a leaf
>>> + * boundary. If not present then there is nothing to split. Otherwise,
>>> + * if we have a pud leaf, split to contpmd.
>>> + */
>>> + if (ALIGN_DOWN(addr, PUD_SIZE) == addr)
>>> + goto out;
>>> + pudp = pud_offset(p4dp, addr);
>>> + pud = pudp_get(pudp);
>>> + if (!pud_present(pud))
>>> + goto out;
>>> + if (pud_leaf(pud)) {
>>> + ret = split_pud(pudp, pud);
>>> + if (ret)
>>> + goto out;
>>> + }
>>> +
>>> + /*
>>> + * CONTPMD: If addr is CONTPMD aligned then addr already describes a
>>> + * leaf boundary. If not present then there is nothing to split.
>>> + * Otherwise, if we have a contpmd leaf, split to pmd.
>>> + */
>>> + if (ALIGN_DOWN(addr, CONT_PMD_SIZE) == addr)
>>> + goto out;
>>> + pmdp = pmd_offset(pudp, addr);
>>> + pmd = pmdp_get(pmdp);
>>> + if (!pmd_present(pmd))
>>> + goto out;
>>> + if (pmd_leaf(pmd)) {
>>> + if (pmd_cont(pmd))
>>> + split_contpmd(pmdp);
>>> + /*
>>> + * PMD: If addr is PMD aligned then addr already describes a
>>> + * leaf boundary. Otherwise, split to contpte.
>>> + */
>>> + if (ALIGN_DOWN(addr, PMD_SIZE) == addr)
>>> + goto out;
>>> + ret = split_pmd(pmdp, pmd);
>>> + if (ret)
>>> + goto out;
>>> + }
>>> +
>>> + /*
>>> + * CONTPTE: If addr is CONTPTE aligned then addr already describes a
>>> + * leaf boundary. If not present then there is nothing to split.
>>> + * Otherwise, if we have a contpte leaf, split to pte.
>>> + */
>>> + if (ALIGN_DOWN(addr, CONT_PMD_SIZE) == addr)
>>> + goto out;
>>> + ptep = pte_offset_kernel(pmdp, addr);
>>> + pte = __ptep_get(ptep);
>>> + if (!pte_present(pte))
>>> + goto out;
>>> + if (pte_cont(pte))
>>> + split_contpte(ptep);
>>> +
>>> +out:
>>> + arch_leave_lazy_mmu_mode();
>>> + return ret;
>>> +}
>>> ---8<---
>>>
>>> Thanks,
>>> Ryan
>>>
Powered by blists - more mailing lists