[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8e19c1f4-95f9-48eb-a854-0b1b6bee69f1@os.amperecomputing.com>
Date: Mon, 23 Jun 2025 12:12:49 -0700
From: Yang Shi <yang@...amperecomputing.com>
To: Ryan Roberts <ryan.roberts@....com>, will@...nel.org,
catalin.marinas@....com, Miko.Lenczewski@....com, dev.jain@....com,
scott@...amperecomputing.com, cl@...two.org
Cc: linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/4] arm64: mm: support large block mapping when
rodata=full
On 6/23/25 6:26 AM, Ryan Roberts wrote:
> [...]
>
>>> +
>>> +int split_leaf_mapping(unsigned long addr)
>> Thanks for coming up with the code. It does help to understand your idea. Now I
>> see why you suggested "split_mapping(start); split_mapping(end);" model. It does
>> make the implementation easier because we don't need a loop anymore. But this
>> may have a couple of problems:
>> 1. We need walk the page table twice instead of once. It sounds expensive.
> Yes we need to walk twice. That may be more expensive or less expensive,
> depending on the size of the range that you are splitting. If the range is large
> then your approach loops through every leaf mapping between the start and end
> which will be more expensive than just doing 2 walks. If the range is small then
> your approach can avoid the second walk, but at the expense of all the extra
> loop overhead.
Yes, it depends on the page table layout (the more fragmented the more
loads) and the range passed in by the callers. But AFAICT, the most
existing callers just try to change permission on page basis. I know you
are looking at adding more block/cont mapping support for vmalloc, but
will the large range case dominate?
>
> My suggestion requires 5 loads (assuming the maximum of 5 levels of lookup).
> Personally I think this is probably acceptable? Perhaps we need some other
> voices here.
Doesn't it require 10 loads for both start and end together? The 5 loads
for end may be fast since they are likely cached if they fall into the
same PGD/P4D/PUD/PMD.
>
>
>> 2. How should we handle repainting? We need split all the page tables all the
>> way down to PTE for repainting between start and end rather than keeping block
>> mappings. This model doesn't work, right? For example, repaint a 2G block. The
>> first 1G is mapped by a PUD, the second 1G is mapped by 511 PMD and 512 PTEs.
>> split_mapping(start) will split the first 1G, but split_mapping(end) will do
>> nothing, the 511 PMDs are kept intact. In addition, I think we also prefer reuse
>> the split primitive for repainting instead of inventing another one.
> I agree my approach doesn't work for the repainting case. But I think what I'm
> trying to say is that the 2 things are different operations;
> split_leaf_mapping() is just trying to ensure that the start and end of a ragion
> are on leaf boundaries. Repainting is trying to ensure that all leaf mappings
> within a range are PTE-size. I've implemented the former and you've implemented
> that latter. Your implementation looks like meets the former's requirements
> because you are only testing it for the case where the range is 1 page. But
> actually it is splitting everything in the range to PTEs.
I can understand why you saw they are two different operations. And the
repainting is basically one-off thing. However they share a lot of
common logic (for example, allocate page table, populate new page table
entries, etc) from code point of view. Repainting is just a special case
of split (no block and cont mappings) in this perspective. If we
implement them separately, I can see there will be a lot of duplicate
code. I'm not sure whether this is preferred or not.
Thanks,
Yang
>
> Thanks,
> Ryan
>
>> Thanks,
>> Yang
>>
>>> +{
>>> + pgd_t *pgdp, pgd;
>>> + p4d_t *p4dp, p4d;
>>> + pud_t *pudp, pud;
>>> + pmd_t *pmdp, pmd;
>>> + pte_t *ptep, pte;
>>> + int ret = 0;
>>> +
>>> + /*
>>> + * !BBML2_NOABORT systems should not be trying to change permissions on
>>> + * anything that is not pte-mapped in the first place. Just return early
>>> + * and let the permission change code raise a warning if not already
>>> + * pte-mapped.
>>> + */
>>> + if (!system_supports_bbml2_noabort())
>>> + return 0;
>>> +
>>> + /*
>>> + * Ensure addr is at least page-aligned since this is the finest
>>> + * granularity we can split to.
>>> + */
>>> + if (addr != PAGE_ALIGN(addr))
>>> + return -EINVAL;
>>> +
>>> + arch_enter_lazy_mmu_mode();
>>> +
>>> + /*
>>> + * PGD: If addr is PGD aligned then addr already describes a leaf
>>> + * boundary. If not present then there is nothing to split.
>>> + */
>>> + if (ALIGN_DOWN(addr, PGDIR_SIZE) == addr)
>>> + goto out;
>>> + pgdp = pgd_offset_k(addr);
>>> + pgd = pgdp_get(pgdp);
>>> + if (!pgd_present(pgd))
>>> + goto out;
>>> +
>>> + /*
>>> + * P4D: If addr is P4D aligned then addr already describes a leaf
>>> + * boundary. If not present then there is nothing to split.
>>> + */
>>> + if (ALIGN_DOWN(addr, P4D_SIZE) == addr)
>>> + goto out;
>>> + p4dp = p4d_offset(pgdp, addr);
>>> + p4d = p4dp_get(p4dp);
>>> + if (!p4d_present(p4d))
>>> + goto out;
>>> +
>>> + /*
>>> + * PUD: If addr is PUD aligned then addr already describes a leaf
>>> + * boundary. If not present then there is nothing to split. Otherwise,
>>> + * if we have a pud leaf, split to contpmd.
>>> + */
>>> + if (ALIGN_DOWN(addr, PUD_SIZE) == addr)
>>> + goto out;
>>> + pudp = pud_offset(p4dp, addr);
>>> + pud = pudp_get(pudp);
>>> + if (!pud_present(pud))
>>> + goto out;
>>> + if (pud_leaf(pud)) {
>>> + ret = split_pud(pudp, pud);
>>> + if (ret)
>>> + goto out;
>>> + }
>>> +
>>> + /*
>>> + * CONTPMD: If addr is CONTPMD aligned then addr already describes a
>>> + * leaf boundary. If not present then there is nothing to split.
>>> + * Otherwise, if we have a contpmd leaf, split to pmd.
>>> + */
>>> + if (ALIGN_DOWN(addr, CONT_PMD_SIZE) == addr)
>>> + goto out;
>>> + pmdp = pmd_offset(pudp, addr);
>>> + pmd = pmdp_get(pmdp);
>>> + if (!pmd_present(pmd))
>>> + goto out;
>>> + if (pmd_leaf(pmd)) {
>>> + if (pmd_cont(pmd))
>>> + split_contpmd(pmdp);
>>> + /*
>>> + * PMD: If addr is PMD aligned then addr already describes a
>>> + * leaf boundary. Otherwise, split to contpte.
>>> + */
>>> + if (ALIGN_DOWN(addr, PMD_SIZE) == addr)
>>> + goto out;
>>> + ret = split_pmd(pmdp, pmd);
>>> + if (ret)
>>> + goto out;
>>> + }
>>> +
>>> + /*
>>> + * CONTPTE: If addr is CONTPTE aligned then addr already describes a
>>> + * leaf boundary. If not present then there is nothing to split.
>>> + * Otherwise, if we have a contpte leaf, split to pte.
>>> + */
>>> + if (ALIGN_DOWN(addr, CONT_PMD_SIZE) == addr)
>>> + goto out;
>>> + ptep = pte_offset_kernel(pmdp, addr);
>>> + pte = __ptep_get(ptep);
>>> + if (!pte_present(pte))
>>> + goto out;
>>> + if (pte_cont(pte))
>>> + split_contpte(ptep);
>>> +
>>> +out:
>>> + arch_leave_lazy_mmu_mode();
>>> + return ret;
>>> +}
>>> ---8<---
>>>
>>> Thanks,
>>> Ryan
>>>
Powered by blists - more mailing lists