[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6016c0e9-b567-4205-8368-1f1c76184a28@huawei.com>
Date: Tue, 7 May 2024 18:59:31 +0800
From: Kefeng Wang <wangkefeng.wang@...wei.com>
To: Ryan Roberts <ryan.roberts@....com>, Yang Shi <shy828301@...il.com>
CC: Matthew Wilcox <willy@...radead.org>, Yang Shi
<yang@...amperecomputing.com>, <riel@...riel.com>, <cl@...ux.com>,
<akpm@...ux-foundation.org>, <linux-kernel@...r.kernel.org>,
<linux-mm@...ck.org>, Ze Zuo <zuoze1@...wei.com>
Subject: Re: [RESEND PATCH] mm: align larger anonymous mappings on THP
boundaries
On 2024/5/7 18:08, Ryan Roberts wrote:
> On 07/05/2024 09:25, Kefeng Wang wrote:
>> Hi Ryan, Yang and all,
>>
>> We see another regression on arm64(no issue on x86) when test memory
>> latency from lmbench,
>>
>> ./lat_mem_rd -P 1 512M 128
>
> Do you know exectly what this test is doing?
lat_mem_rd measures memory read latency for varying memory sizes and
strides, see https://lmbench.sourceforge.net/man/lat_mem_rd.8.html
>
>>
>> memory latency(smaller is better)
>>
>> MiB 6.9-rc7 6.9-rc7+revert
>
> And what exactly have you reverted? I'm guessing just commit efa7df3e3bb5 ("mm:
> align larger anonymous mappings on THP boundaries")?
Yes, just revert efa7df3e3bb5.
>
>> 0.00049 1.539 1.539
>> 0.00098 1.539 1.539
>> 0.00195 1.539 1.539
>> 0.00293 1.539 1.539
>> 0.00391 1.539 1.539
>> 0.00586 1.539 1.539
>> 0.00781 1.539 1.539
>> 0.01172 1.539 1.539
>> 0.01562 1.539 1.539
>> 0.02344 1.539 1.539
>> 0.03125 1.539 1.539
>> 0.04688 1.539 1.539
>> 0.0625 1.540 1.540
>> 0.09375 3.634 3.086
>
> So the first regression is for 96K - I'm guessing that's the mmap size? That
> size shouldn't even be affected by this patch, apart from a few adds and a
> compare which determines the size is too small to do PMD alignment for.
Yes, no anon thp.
>
>> 0.125 3.874 3.175
>> 0.1875 3.544 3.288
>> 0.25 3.556 3.461
>> 0.375 3.641 3.644
>> 0.5 4.125 3.851
>> 0.75 4.968 4.323
>> 1 5.143 4.686
>> 1.5 5.309 4.957
>> 2 5.370 5.116
>> 3 5.430 5.471
>> 4 5.457 5.671
>> 6 6.100 6.170
>> 8 6.496 6.468
>>
>> -----------------------s
>> * L1 cache = 8M, it is no big changes below 8M *
>> * but the latency reduce a lot when revert this patch from L2 *
>>
>> 12 6.917 6.840
>> 16 7.268 7.077
>> 24 7.536 7.345
>> 32 10.723 9.421
>> 48 14.220 11.350
>> 64 16.253 12.189
>> 96 14.494 12.507
>> 128 14.630 12.560
>> 192 15.402 12.967
>> 256 16.178 12.957
>> 384 15.177 13.346
>> 512 15.235 13.233
>>
>> After quickly check the smaps, but don't find any clues, any suggestion?
>
> Without knowing exactly what the test does, it's difficult to know what to
The major operation(memory read) shows below,
#define ONE p = (char **)*p;
#define FIVE ONE ONE ONE ONE ONE
#define TEN FIVE FIVE
#define FIFTY TEN TEN TEN TEN TEN
#define HUNDRED FIFTY FIFTY
while (iterations-- > 0) {
for (i = 0; i < count; ++i) {
HUNDRED;
}
}
https://github.com/intel/lmbench/blob/master/src/lat_mem_rd.c#L95
> suggest. If you want to try something semi-randomly; it might be useful to rule
> out the arm64 contpte feature. I don't see how that would be interacting here if
> mTHP is disabled (is it?). But its new for 6.9 and arm64 only. Disable with
> ARM64_CONTPTE (needs EXPERT) at compile time.
I don't enabled mTHP, so it should be not related about ARM64_CONTPTE,
but will have a try.
Powered by blists - more mailing lists