linux-kernel - Re: [v2 PATCH] arm64: mm: show direct mapping use in /proc/meminfo

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <6ffacf84-424e-4241-b993-29afcc05da25@os.amperecomputing.com>
Date: Thu, 20 Nov 2025 09:33:15 -0800
From: Yang Shi <yang@...amperecomputing.com>
To: Ryan Roberts <ryan.roberts@....com>, cl@...two.org,
 catalin.marinas@....com, will@...nel.org
Cc: linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [v2 PATCH] arm64: mm: show direct mapping use in /proc/meminfo



On 11/20/25 12:38 AM, Ryan Roberts wrote:
>>>>> I have a long-term aspiration to enable "per-process page size", where each
>>>>> user
>>>>> space process can use a different page size. The first step is to be able to
>>>>> emulate a page size to the process which is larger than the kernel's. For that
>>>>> reason, I really dislike introducing new ABI that exposes the geometry of the
>>>>> kernel page tables to user space. I'd really like to be clear on what use case
>>>>> benefits from this sort of information before we add it.
>>>> Thanks for the information. I'm not sure what "per-process page size" exactly
>>>> is. But isn't it just user space thing? I have hard time to understand how
>>>> exposing kernel page table geometry will have impact on it.
>>> It's a feature I'm working on/thinking about that, if I'm honest, has a fairly
>>> low probability of making it upstream. arm64 supports multiple base page sizes;
>>> 4K, 16K, 64K. The idea is to allow different processes to use a different base
>>> page size and then actually use the native page table for that size in TTBR0.
>>> The idea is to have the kernel use 4K internally and most processes would use 4K
>>> to save memory. But performance critical processes could use 64K.
>> Aha, I see. I thought you were talking about mTHP. IIUC, userspace may have 4K,
>> 16K or 64K base page size, but kernel still uses 4K base page size? Can arm64
>> support have different base page sizes for userspace and kernel? It seems
>> surprising to me if it does.
> Yes arm64 supports exactly this; User page tables are mapped via TTBR0 and
> kernel page tables are mapped via TTBR1. They are both independent structures
> and base page size can be set independently.
>
>> If it doesn't, it sounds you need at least 3 kernel
>> page tables for 4K, 16K and 64K respectively, right?
> No; for my design, the kernel always uses a 4K page table. Only user space page
> tables have different sizes.

I see. IIUC if so you need to let the page fault handler know you need 
16K or 64K so that kernel can allocate 16K or 64K instead of 4K. And you 
also need to let kernel install one single PTE instead of multiple PTEs.

>
>> I'm wondering what kind usecase really needs this. Isn't mTHP good enough for
>> the most usecases? We can have auto mTHP size support on per VMA basis. If I
>> remember correctly, this has been raised a couple of times when we discussed
>> about mTHP. Anyway this may be a little bit off the topic.
> There is still a performance gap between 4K+CONT vs 64K. There are basically 4
> aspects that affect HW performance as the base page size gets bigger:
>
>   - TLB reach (how much memory a single TLB entry can describe)
>   - Walk cache reach (how much memory a single walk cache entry can describe)
>   - number of levels of look up (how many loads are required for full table walk)
>   - data cache efficiency (how efficiently the mappings are described in memory)
>
> 4K+CONT (i.e. 64K-sized mTHP) only solves the first item.

Oh yeah, this is what our benchmarks showed. The cost of page table walk 
is not improved due to the number of levels of look up. So some 
benchmarks don't get too much benefit.

Thanks,
Yang
>
> But as I said, I think there is a high risk of this not actually going anywhere...
>
> Thanks,
> Ryan
>