linux-kernel - Re: [RESEND PATCH] mm: align larger anonymous mappings on THP boundaries

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2b403705-a03c-4cfe-8d95-b38dd83fca52@arm.com>
Date: Tue, 7 May 2024 16:53:10 +0100
From: Ryan Roberts <ryan.roberts@....com>
To: Kefeng Wang <wangkefeng.wang@...wei.com>,
 David Hildenbrand <david@...hat.com>, Yang Shi <shy828301@...il.com>
Cc: Matthew Wilcox <willy@...radead.org>,
 Yang Shi <yang@...amperecomputing.com>, riel@...riel.com, cl@...ux.com,
 akpm@...ux-foundation.org, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
 Ze Zuo <zuoze1@...wei.com>
Subject: Re: [RESEND PATCH] mm: align larger anonymous mappings on THP
 boundaries

On 07/05/2024 14:53, Kefeng Wang wrote:
> 
> 
> On 2024/5/7 19:13, David Hildenbrand wrote:
>>
>>> https://github.com/intel/lmbench/blob/master/src/lat_mem_rd.c#L95
>>>
>>>> suggest. If you want to try something semi-randomly; it might be useful to rule
>>>> out the arm64 contpte feature. I don't see how that would be interacting
>>>> here if
>>>> mTHP is disabled (is it?). But its new for 6.9 and arm64 only. Disable with
>>>> ARM64_CONTPTE (needs EXPERT) at compile time.
>>> I don't enabled mTHP, so it should be not related about ARM64_CONTPTE,
>>> but will have a try.
> 
> After ARM64_CONTPTE disabled, memory read latency is similar with ARM64_CONTPTE
> enabled(default 6.9-rc7), still larger than align anon reverted.

OK thanks for trying.

Looking at the source for lmbench, its malloc'ing (512M + 8K) up front and using
that for all sizes. That will presumably be considered "large" by malloc and
will be allocated using mmap. So with the patch, it will be 2M aligned. Without
it, it probably won't. I'm still struggling to understand why not aligning it in
virtual space would make it more performant though...

Is it possible to provide the smaps output for at least that 512M+8K block for
both cases? It might give a bit of a clue.

Do you have traditional (PMD-sized) THP enabled? If its enabled and unaligned
then the front of the buffer wouldn't be mapped with THP, but if it is aligned,
it will. That could affect it.

> 
>>
>> cont-pte can get active if we're just lucky when allocating pages in the right
>> order, correct Ryan?
>>