linux-kernel - Re: [RESEND PATCH] mm: align larger anonymous mappings on THP boundaries

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6016c0e9-b567-4205-8368-1f1c76184a28@huawei.com>
Date: Tue, 7 May 2024 18:59:31 +0800
From: Kefeng Wang <wangkefeng.wang@...wei.com>
To: Ryan Roberts <ryan.roberts@....com>, Yang Shi <shy828301@...il.com>
CC: Matthew Wilcox <willy@...radead.org>, Yang Shi
	<yang@...amperecomputing.com>, <riel@...riel.com>, <cl@...ux.com>,
	<akpm@...ux-foundation.org>, <linux-kernel@...r.kernel.org>,
	<linux-mm@...ck.org>, Ze Zuo <zuoze1@...wei.com>
Subject: Re: [RESEND PATCH] mm: align larger anonymous mappings on THP
 boundaries



On 2024/5/7 18:08, Ryan Roberts wrote:
> On 07/05/2024 09:25, Kefeng Wang wrote:
>> Hi Ryan, Yang and all,
>>
>> We see another regression on arm64(no issue on x86) when test memory
>> latency from lmbench,
>>
>> ./lat_mem_rd -P 1 512M 128
> 
> Do you know exectly what this test is doing?

lat_mem_rd measures memory read latency for varying memory sizes and
strides, see https://lmbench.sourceforge.net/man/lat_mem_rd.8.html
> 
>>
>> memory latency(smaller is better)
>>
>> MiB     6.9-rc7    6.9-rc7+revert
> 
> And what exactly have you reverted? I'm guessing just commit efa7df3e3bb5 ("mm:
> align larger anonymous mappings on THP boundaries")?

Yes, just revert efa7df3e3bb5.
> 
>> 0.00049    1.539     1.539
>> 0.00098    1.539     1.539
>> 0.00195    1.539     1.539
>> 0.00293    1.539     1.539
>> 0.00391    1.539     1.539
>> 0.00586    1.539     1.539
>> 0.00781    1.539     1.539
>> 0.01172    1.539     1.539
>> 0.01562    1.539     1.539
>> 0.02344    1.539     1.539
>> 0.03125    1.539     1.539
>> 0.04688    1.539     1.539
>> 0.0625    1.540     1.540
>> 0.09375    3.634     3.086
> 
> So the first regression is for 96K - I'm guessing that's the mmap size? That
> size shouldn't even be affected by this patch, apart from a few adds and a
> compare which determines the size is too small to do PMD alignment for.

Yes, no anon thp.
> 
>> 0.125   3.874     3.175
>> 0.1875  3.544     3.288
>> 0.25    3.556     3.461
>> 0.375   3.641     3.644
>> 0.5     4.125     3.851
>> 0.75    4.968     4.323
>> 1       5.143     4.686
>> 1.5     5.309     4.957
>> 2       5.370     5.116
>> 3       5.430     5.471
>> 4       5.457     5.671
>> 6       6.100     6.170
>> 8       6.496     6.468
>>
>> -----------------------s
>> * L1 cache = 8M, it is no big changes below 8M *
>> * but the latency reduce a lot when revert this patch from L2 *
>>
>> 12      6.917     6.840
>> 16      7.268     7.077
>> 24      7.536     7.345
>> 32      10.723     9.421
>> 48      14.220     11.350
>> 64      16.253     12.189
>> 96      14.494     12.507
>> 128     14.630     12.560
>> 192     15.402     12.967
>> 256     16.178     12.957
>> 384     15.177     13.346
>> 512     15.235     13.233
>>
>> After quickly check the smaps, but don't find any clues, any suggestion?
> 
> Without knowing exactly what the test does, it's difficult to know what to


The major operation(memory read) shows below,

#define    ONE      p = (char **)*p;
#define    FIVE     ONE ONE ONE ONE ONE
#define    TEN      FIVE FIVE
#define    FIFTY    TEN TEN TEN TEN TEN
#define    HUNDRED  FIFTY FIFTY

     while (iterations-- > 0) {
         for (i = 0; i < count; ++i) {
             HUNDRED;
         }
     }

https://github.com/intel/lmbench/blob/master/src/lat_mem_rd.c#L95

> suggest. If you want to try something semi-randomly; it might be useful to rule
> out the arm64 contpte feature. I don't see how that would be interacting here if
> mTHP is disabled (is it?). But its new for 6.9 and arm64 only. Disable with
> ARM64_CONTPTE (needs EXPERT) at compile time.
I don't enabled mTHP, so it should be not related about ARM64_CONTPTE, 
but will have a try.