lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 7 May 2024 16:53:10 +0100
From: Ryan Roberts <ryan.roberts@....com>
To: Kefeng Wang <wangkefeng.wang@...wei.com>,
 David Hildenbrand <david@...hat.com>, Yang Shi <shy828301@...il.com>
Cc: Matthew Wilcox <willy@...radead.org>,
 Yang Shi <yang@...amperecomputing.com>, riel@...riel.com, cl@...ux.com,
 akpm@...ux-foundation.org, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
 Ze Zuo <zuoze1@...wei.com>
Subject: Re: [RESEND PATCH] mm: align larger anonymous mappings on THP
 boundaries

On 07/05/2024 14:53, Kefeng Wang wrote:
> 
> 
> On 2024/5/7 19:13, David Hildenbrand wrote:
>>
>>> https://github.com/intel/lmbench/blob/master/src/lat_mem_rd.c#L95
>>>
>>>> suggest. If you want to try something semi-randomly; it might be useful to rule
>>>> out the arm64 contpte feature. I don't see how that would be interacting
>>>> here if
>>>> mTHP is disabled (is it?). But its new for 6.9 and arm64 only. Disable with
>>>> ARM64_CONTPTE (needs EXPERT) at compile time.
>>> I don't enabled mTHP, so it should be not related about ARM64_CONTPTE,
>>> but will have a try.
> 
> After ARM64_CONTPTE disabled, memory read latency is similar with ARM64_CONTPTE
> enabled(default 6.9-rc7), still larger than align anon reverted.

OK thanks for trying.

Looking at the source for lmbench, its malloc'ing (512M + 8K) up front and using
that for all sizes. That will presumably be considered "large" by malloc and
will be allocated using mmap. So with the patch, it will be 2M aligned. Without
it, it probably won't. I'm still struggling to understand why not aligning it in
virtual space would make it more performant though...

Is it possible to provide the smaps output for at least that 512M+8K block for
both cases? It might give a bit of a clue.

Do you have traditional (PMD-sized) THP enabled? If its enabled and unaligned
then the front of the buffer wouldn't be mapped with THP, but if it is aligned,
it will. That could affect it.

> 
>>
>> cont-pte can get active if we're just lucky when allocating pages in the right
>> order, correct Ryan?
>>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ