linux-kernel - Re: [PATCH 3/4] arm64: mm: support large block mapping when rodata=full

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b2d3e684-e3dc-41b5-9708-ca5926c55ebf@arm.com>
Date: Wed, 6 Aug 2025 08:20:09 +0100
From: Ryan Roberts <ryan.roberts@....com>
To: Yang Shi <yang@...amperecomputing.com>, will@...nel.org,
 catalin.marinas@....com, akpm@...ux-foundation.org, Miko.Lenczewski@....com,
 dev.jain@....com, scott@...amperecomputing.com, cl@...two.org
Cc: linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/4] arm64: mm: support large block mapping when
 rodata=full

On 05/08/2025 19:53, Yang Shi wrote:

[...]

>>> +    arch_enter_lazy_mmu_mode();
>>> +    ret = split_pgd(pgd_offset_k(start), start, end);
>> My instinct still remains that it would be better not to iterate over the range
>> here, but instead call a "split(start); split(end);" since we just want to split
>> the start and end. So the code would be simpler and probably more performant if
>> we get rid of all the iteration.
> 
> It should be more performant for splitting large range, especially the range
> includes leaf mappings at different levels. But I had some optimization to skip
> leaf mappings in this version, so it should be close to your implementation from
> performance perspective. And it just walks the page table once instead of twice.
> It should be more efficient for small split, for example, 4K.

I guess this is the crux of our disagreement. I think the "walks the table once
for 4K" is a micro optimization, which I doubt we would see on any benchmark
results. In the absence of data, I'd prefer the simpler, smaller, easier to
understand version.

Both implementations are on list now; perhaps the maintainers can steer us.

Thanks,
Ryan