lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <25ecbf39-e5dc-496c-be3c-8b25eeae2414@os.amperecomputing.com>
Date: Thu, 26 Jun 2025 15:39:14 -0700
From: Yang Shi <yang@...amperecomputing.com>
To: Ryan Roberts <ryan.roberts@....com>, will@...nel.org,
 catalin.marinas@....com, Miko.Lenczewski@....com, dev.jain@....com,
 scott@...amperecomputing.com, cl@...two.org
Cc: linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/4] arm64: mm: support large block mapping when
 rodata=full



On 6/23/25 6:26 AM, Ryan Roberts wrote:
> [...]
>
>>> +
>>> +int split_leaf_mapping(unsigned long addr)
>> Thanks for coming up with the code. It does help to understand your idea. Now I
>> see why you suggested "split_mapping(start); split_mapping(end);" model. It does
>> make the implementation easier because we don't need a loop anymore. But this
>> may have a couple of problems:
>>    1. We need walk the page table twice instead of once. It sounds expensive.
> Yes we need to walk twice. That may be more expensive or less expensive,
> depending on the size of the range that you are splitting. If the range is large
> then your approach loops through every leaf mapping between the start and end
> which will be more expensive than just doing 2 walks. If the range is small then
> your approach can avoid the second walk, but at the expense of all the extra
> loop overhead.

Thinking about this further. Although there is some extra loop overhead, 
but there should be not extra loads. We can check whether the start and 
end are properly aligned or not, it they are aligned, we just continue 
the loop without loading page table entry.

And we can optimize the loop by advancing multiple PUD/PMD/CONT size at 
a time instead of one at a time. The pseudo code (for example, pmd 
level) looks like:

do {
      next = pmd_addr_end(start, end);

      if (next < end)
          nr = ((end - next) / PMD_SIZE) + 1;

      if (((start | next) & ~PMD_MASK) == 0)
          continue;

      split_pmd(start, next);
} while (pmdp += nr, start = next * nr, start != end)


For repainting case, we just need do:

do {
      nr = 1;
      next = pmd_addr_end(start, end);

      if (next < end && !repainting)
          nr = ((end - next) / PMD_SIZE) + 1;

      if (((start | next) & ~PMD_MASK) == 0 && !repainting)
          continue;

      split_pmd(start, next);
} while (pmdp += nr, start = next * nr, start != end)

This should reduce loop overhead and duplicate code for repainting.

Thanks,
Yang

>
> My suggestion requires 5 loads (assuming the maximum of 5 levels of lookup).
> Personally I think this is probably acceptable? Perhaps we need some other
> voices here.
>
>
>>    2. How should we handle repainting? We need split all the page tables all the
>> way down to PTE for repainting between start and end rather than keeping block
>> mappings. This model doesn't work, right? For example, repaint a 2G block. The
>> first 1G is mapped by a PUD, the second 1G is mapped by 511 PMD and 512 PTEs.
>> split_mapping(start) will split the first 1G, but split_mapping(end) will do
>> nothing, the 511 PMDs are kept intact. In addition, I think we also prefer reuse
>> the split primitive for repainting instead of inventing another one.
> I agree my approach doesn't work for the repainting case. But I think what I'm
> trying to say is that the 2 things are different operations;
> split_leaf_mapping() is just trying to ensure that the start and end of a ragion
> are on leaf boundaries. Repainting is trying to ensure that all leaf mappings
> within a range are PTE-size. I've implemented the former and you've implemented
> that latter. Your implementation looks like meets the former's requirements
> because you are only testing it for the case where the range is 1 page. But
> actually it is splitting everything in the range to PTEs.
>
> Thanks,
> Ryan
>
>> Thanks,
>> Yang
>>
>>> +{
>>> +    pgd_t *pgdp, pgd;
>>> +    p4d_t *p4dp, p4d;
>>> +    pud_t *pudp, pud;
>>> +    pmd_t *pmdp, pmd;
>>> +    pte_t *ptep, pte;
>>> +    int ret = 0;
>>> +
>>> +    /*
>>> +     * !BBML2_NOABORT systems should not be trying to change permissions on
>>> +     * anything that is not pte-mapped in the first place. Just return early
>>> +     * and let the permission change code raise a warning if not already
>>> +     * pte-mapped.
>>> +     */
>>> +    if (!system_supports_bbml2_noabort())
>>> +        return 0;
>>> +
>>> +    /*
>>> +     * Ensure addr is at least page-aligned since this is the finest
>>> +     * granularity we can split to.
>>> +     */
>>> +    if (addr != PAGE_ALIGN(addr))
>>> +        return -EINVAL;
>>> +
>>> +    arch_enter_lazy_mmu_mode();
>>> +
>>> +    /*
>>> +     * PGD: If addr is PGD aligned then addr already describes a leaf
>>> +     * boundary. If not present then there is nothing to split.
>>> +     */
>>> +    if (ALIGN_DOWN(addr, PGDIR_SIZE) == addr)
>>> +        goto out;
>>> +    pgdp = pgd_offset_k(addr);
>>> +    pgd = pgdp_get(pgdp);
>>> +    if (!pgd_present(pgd))
>>> +        goto out;
>>> +
>>> +    /*
>>> +     * P4D: If addr is P4D aligned then addr already describes a leaf
>>> +     * boundary. If not present then there is nothing to split.
>>> +     */
>>> +    if (ALIGN_DOWN(addr, P4D_SIZE) == addr)
>>> +        goto out;
>>> +    p4dp = p4d_offset(pgdp, addr);
>>> +    p4d = p4dp_get(p4dp);
>>> +    if (!p4d_present(p4d))
>>> +        goto out;
>>> +
>>> +    /*
>>> +     * PUD: If addr is PUD aligned then addr already describes a leaf
>>> +     * boundary. If not present then there is nothing to split. Otherwise,
>>> +     * if we have a pud leaf, split to contpmd.
>>> +     */
>>> +    if (ALIGN_DOWN(addr, PUD_SIZE) == addr)
>>> +        goto out;
>>> +    pudp = pud_offset(p4dp, addr);
>>> +    pud = pudp_get(pudp);
>>> +    if (!pud_present(pud))
>>> +        goto out;
>>> +    if (pud_leaf(pud)) {
>>> +        ret = split_pud(pudp, pud);
>>> +        if (ret)
>>> +            goto out;
>>> +    }
>>> +
>>> +    /*
>>> +     * CONTPMD: If addr is CONTPMD aligned then addr already describes a
>>> +     * leaf boundary. If not present then there is nothing to split.
>>> +     * Otherwise, if we have a contpmd leaf, split to pmd.
>>> +     */
>>> +    if (ALIGN_DOWN(addr, CONT_PMD_SIZE) == addr)
>>> +        goto out;
>>> +    pmdp = pmd_offset(pudp, addr);
>>> +    pmd = pmdp_get(pmdp);
>>> +    if (!pmd_present(pmd))
>>> +        goto out;
>>> +    if (pmd_leaf(pmd)) {
>>> +        if (pmd_cont(pmd))
>>> +            split_contpmd(pmdp);
>>> +        /*
>>> +         * PMD: If addr is PMD aligned then addr already describes a
>>> +         * leaf boundary. Otherwise, split to contpte.
>>> +         */
>>> +        if (ALIGN_DOWN(addr, PMD_SIZE) == addr)
>>> +            goto out;
>>> +        ret = split_pmd(pmdp, pmd);
>>> +        if (ret)
>>> +            goto out;
>>> +    }
>>> +
>>> +    /*
>>> +     * CONTPTE: If addr is CONTPTE aligned then addr already describes a
>>> +     * leaf boundary. If not present then there is nothing to split.
>>> +     * Otherwise, if we have a contpte leaf, split to pte.
>>> +     */
>>> +    if (ALIGN_DOWN(addr, CONT_PMD_SIZE) == addr)
>>> +        goto out;
>>> +    ptep = pte_offset_kernel(pmdp, addr);
>>> +    pte = __ptep_get(ptep);
>>> +    if (!pte_present(pte))
>>> +        goto out;
>>> +    if (pte_cont(pte))
>>> +        split_contpte(ptep);
>>> +
>>> +out:
>>> +    arch_leave_lazy_mmu_mode();
>>> +    return ret;
>>> +}
>>> ---8<---
>>>
>>> Thanks,
>>> Ryan
>>>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ