linux-kernel - Re: [v5 PATCH] arm64: mm: show direct mapping use in /proc/meminfo

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c68d0449-b52a-42b7-aa5c-194af5e531e1@os.amperecomputing.com>
Date: Tue, 20 Jan 2026 16:17:40 -0800
From: Yang Shi <yang@...amperecomputing.com>
To: Will Deacon <will@...nel.org>
Cc: catalin.marinas@....com, ryan.roberts@....com, cl@...two.org,
 linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [v5 PATCH] arm64: mm: show direct mapping use in /proc/meminfo

Hi Will,

Gently ping, does the proposed change solve your concerns? If they do, I 
can send v6, hopefully it can be merged in the coming merge window.

Thanks,
Yang


On 1/13/26 4:36 PM, Yang Shi wrote:
>
>
> On 1/13/26 6:36 AM, Will Deacon wrote:
>> On Tue, Jan 06, 2026 at 04:29:44PM -0800, Yang Shi wrote:
>>> Since commit a166563e7ec3 ("arm64: mm: support large block mapping when
>>> rodata=full"), the direct mapping may be split on some machines instead
>>> keeping static since boot. It makes more sense to show the direct 
>>> mapping
>>> use in /proc/meminfo than before.
>>> This patch will make /proc/meminfo show the direct mapping use like the
>>> below (4K base page size):
>>> DirectMap4K:       94792 kB
>>> DirectMap64K:      134208 kB
>>> DirectMap2M:     1173504 kB
>>> DirectMap32M:     5636096 kB
>>> DirectMap1G:    529530880 kB
>>>
>>> Although just the machines which support BBML2_NOABORT can split the
>>> direct mapping, show it on all machines regardless of BBML2_NOABORT so
>>> that the users have consistent view in order to avoid confusion.
>>>
>>> Although ptdump also can tell the direct map use, but it needs to dump
>>> the whole kernel page table. It is costly and overkilling. It is also
>>> in debugfs which may not be enabled by all distros. So showing direct
>>> map use in /proc/meminfo seems more convenient and has less overhead.
>>>
>>> Signed-off-by: Yang Shi <yang@...amperecomputing.com>
>>> ---
>>> v5: * Rebased to v6.19-rc4
>>>      * Fixed the build error for !CONFIG_PROC_FS
>>> v4: * Used PAGE_END instead of _PAGE_END(VA_BITS_MIN) per Ryan
>>>      * Used shorter name for the helpers and variables per Ryan
>>>      * Fixed accounting for memory hotunplug
>>> v3: * Fixed the over-accounting problems per Ryan
>>>      * Introduced helpers for add/sub direct map use and #ifdef them 
>>> with
>>>        CONFIG_PROC_FS per Ryan
>>>      * v3 is a fix patch on top of v2
>>> v2: * Counted in size instead of the number of entries per Ryan
>>>      * Removed shift array per Ryan
>>>      * Use lower case "k" per Ryan
>>>      * Fixed a couple of build warnings reported by kernel test robot
>>>      * Fixed a couple of poential miscounts
>>>
>>>   arch/arm64/mm/mmu.c | 202 
>>> +++++++++++++++++++++++++++++++++++++++-----
>>>   1 file changed, 181 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index 8e1d80a7033e..422441c9a992 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -29,6 +29,7 @@
>>>   #include <linux/mm_inline.h>
>>>   #include <linux/pagewalk.h>
>>>   #include <linux/stop_machine.h>
>>> +#include <linux/proc_fs.h>
>>>     #include <asm/barrier.h>
>>>   #include <asm/cputype.h>
>>> @@ -171,6 +172,85 @@ static void init_clear_pgtable(void *table)
>>>       dsb(ishst);
>>>   }
>>>   +enum dm_type {
>>> +    PTE,
>>> +    CONT_PTE,
>>> +    PMD,
>>> +    CONT_PMD,
>>> +    PUD,
>>> +    NR_DM_TYPE,
>>> +};
>>> +
>>> +#ifdef CONFIG_PROC_FS
>>> +static unsigned long dm_meminfo[NR_DM_TYPE];
>>> +
>>> +void arch_report_meminfo(struct seq_file *m)
>>> +{
>>> +    char *size[NR_DM_TYPE];
>> const?
>
> Yeah, it can be const.
>
>>
>>> +
>>> +#if defined(CONFIG_ARM64_4K_PAGES)
>>> +    size[PTE] = "4k";
>>> +    size[CONT_PTE] = "64k";
>>> +    size[PMD] = "2M";
>>> +    size[CONT_PMD] = "32M";
>>> +    size[PUD] = "1G";
>>> +#elif defined(CONFIG_ARM64_16K_PAGES)
>>> +    size[PTE] = "16k";
>>> +    size[CONT_PTE] = "2M";
>>> +    size[PMD] = "32M";
>>> +    size[CONT_PMD] = "1G";
>>> +#elif defined(CONFIG_ARM64_64K_PAGES)
>>> +    size[PTE] = "64k";
>>> +    size[CONT_PTE] = "2M";
>>> +    size[PMD] = "512M";
>>> +    size[CONT_PMD] = "16G";
>>> +#endif
>>> +
>>> +    seq_printf(m, "DirectMap%s:    %8lu kB\n",
>>> +            size[PTE], dm_meminfo[PTE] >> 10);
>>> +    seq_printf(m, "DirectMap%s:    %8lu kB\n",
>>> +            size[CONT_PTE],
>>> +            dm_meminfo[CONT_PTE] >> 10);
>>> +    seq_printf(m, "DirectMap%s:    %8lu kB\n",
>>> +            size[PMD], dm_meminfo[PMD] >> 10);
>>> +    seq_printf(m, "DirectMap%s:    %8lu kB\n",
>>> +            size[CONT_PMD],
>>> +            dm_meminfo[CONT_PMD] >> 10);
>>> +    if (pud_sect_supported())
>>> +        seq_printf(m, "DirectMap%s:    %8lu kB\n",
>>> +            size[PUD], dm_meminfo[PUD] >> 10);
>> This seems a bit brittle to me. If somebody adds support for l1 block
>> mappings for !4k pages in future, they will forget to update this and
>> we'll end up returning kernel stack in /proc/meminfo afaict.
>
> I can initialize size[PUD] to "NON_SUPPORT" by default. If the case 
> happens, /proc/meminfo just shows "DirectMapNON_SUPPORT", then we will 
> notice something is missed, but no kernel stack data will be leak.
>
>>
>>> +static inline bool is_dm_addr(unsigned long addr)
>>> +{
>>> +    return (addr >= PAGE_OFFSET) && (addr < PAGE_END);
>>> +}
>>> +
>>> +static inline void dm_meminfo_add(unsigned long addr, unsigned long 
>>> size,
>>> +                  enum dm_type type)
>>> +{
>>> +    if (is_dm_addr(addr))
>>> +        dm_meminfo[type] += size;
>>> +}
>>> +
>>> +static inline void dm_meminfo_sub(unsigned long addr, unsigned long 
>>> size,
>>> +                  enum dm_type type)
>>> +{
>>> +    if (is_dm_addr(addr))
>>> +        dm_meminfo[type] -= size;
>>> +}
>>> +#else
>>> +static inline void dm_meminfo_add(unsigned long addr, unsigned long 
>>> size,
>>> +                  enum dm_type type)
>>> +{
>>> +}
>>> +
>>> +static inline void dm_meminfo_sub(unsigned long addr, unsigned long 
>>> size,
>>> +                  enum dm_type type)
>>> +{
>>> +}
>>> +#endif
>>> +
>>>   static void init_pte(pte_t *ptep, unsigned long addr, unsigned 
>>> long end,
>>>                phys_addr_t phys, pgprot_t prot)
>>>   {
>>> @@ -236,6 +316,11 @@ static int alloc_init_cont_pte(pmd_t *pmdp, 
>>> unsigned long addr,
>>>             init_pte(ptep, addr, next, phys, __prot);
>>>   +        if (pgprot_val(__prot) & PTE_CONT)
>>> +            dm_meminfo_add(addr, (next - addr), CONT_PTE);
>>> +        else
>>> +            dm_meminfo_add(addr, (next - addr), PTE);
>>> +
>>>           ptep += pte_index(next) - pte_index(addr);
>>>           phys += next - addr;
>>>       } while (addr = next, addr != end);
>>> @@ -266,6 +351,17 @@ static int init_pmd(pmd_t *pmdp, unsigned long 
>>> addr, unsigned long end,
>>>               (flags & NO_BLOCK_MAPPINGS) == 0) {
>>>               pmd_set_huge(pmdp, phys, prot);
>>>   +            /*
>>> +             * It is possible to have mappings allow cont mapping
>>> +             * but disallow block mapping. For example,
>>> +             * map_entry_trampoline().
>>> +             * So we have to increase CONT_PMD and PMD size here
>>> +             * to avoid double counting.
>>> +             */
>>> +            if (pgprot_val(prot) & PTE_CONT)
>>> +                dm_meminfo_add(addr, (next - addr), CONT_PMD);
>>> +            else
>>> +                dm_meminfo_add(addr, (next - addr), PMD);
>> I don't understand the comment you're adding here. If somebody passes
>> NO_BLOCK_MAPPINGS then that also prevents contiguous entries except at
>> level 3.
>
> The comment may be misleading. I meant if we have the accounting code 
> for CONT_PMD in alloc_init_cont_pmd(), for example,
>
> @@ -433,6 +433,11 @@ static int alloc_init_cont_pmd(pud_t *pudp, 
> unsigned long addr,
>                 if (ret)
>                         goto out;
>
> +               if (pgprot_val(prot) & PTE_CONT)
> +                       dm_meminfo_add(addr, (next - addr), CONT_PMD);
>
>                 pmdp += pmd_index(next) - pmd_index(addr);
>                 phys += next - addr;
>         } while (addr = next, addr != end);
>
> If the described case happens, we actually miscount CONT_PMD. So I 
> need to check whether it is CONT in init_pmd() instead. If the comment 
> is confusing, I can just remove it.
>
>> It also doesn't look you handle the error case properly when the mapping
>> fails.
>
> I don't quite get what fail do you mean? pmd_set_huge() doesn't fail. 
> Or you meant hotplug fails? If so the hot unplug will decrease the 
> counters, which is called in the error handling path.
>
>>
>>> -static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr,
>>> +static void unmap_hotplug_pte_range(pte_t *ptep, unsigned long addr,
>>>                       unsigned long end, bool free_mapped,
>>>                       struct vmem_altmap *altmap)
>>>   {
>>> -    pte_t *ptep, pte;
>>> +    pte_t pte;
>>>         do {
>>> -        ptep = pte_offset_kernel(pmdp, addr);
>>>           pte = __ptep_get(ptep);
>>>           if (pte_none(pte))
>>>               continue;
>>>             WARN_ON(!pte_present(pte));
>>>           __pte_clear(&init_mm, addr, ptep);
>>> +        dm_meminfo_sub(addr, PAGE_SIZE, PTE);
>>>           flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
>>>           if (free_mapped)
>>>               free_hotplug_page_range(pte_page(pte),
>>>                           PAGE_SIZE, altmap);
>> Is the existing code correct for contiguous entries here? I'd have
>> thought that we'd need to make the range non-contiguous before knocking
>> out the TLB.
>
> Thanks for pointing this out. I didn't pay too much attention to such 
> details to the existing code. Actually I did notice hot unplug code 
> doesn't handle contiguous mappings, so I added 
> unmap_hotplug_cont_{pmd|pte}_range() in this patch in order to 
> maintain the counters correctly. I'm not sure it is intended (or maybe 
> just unnecessary) or just an overlook.
>
> You are concerned this may result in misprogramming issue? TBH, I'm a 
> little bit confused by the "misprogramming contiguous bit" described 
> in ARM ARM. In this case, the TLB flush should remove the large 
> contiguous TLB entry, so there should be no overlapping entries in 
> TLB. Or this is still a problem because some entries have contiguous 
> bit set, but some don't? But when we change the contiguous bit for a 
> range of entries, there is always a moment that some have contiguous 
> bit set, but some don't. My understanding is it is fine as long as 
> there is no overlapping entries in TLB. Anyway I may misunderstand it.
>
> Thanks,
> Yang
>
>>
>> Will
>