[<prev] [next>] [day] [month] [year] [list]
Message-ID: <1cd515972cbd4be9ac8b5abb635c052a@huawei.com>
Date: Tue, 28 Oct 2025 11:32:31 +0000
From: zhangqilong <zhangqilong3@...wei.com>
To: David Hildenbrand <david@...hat.com>, "akpm@...ux-foundation.org"
<akpm@...ux-foundation.org>, "lorenzo.stoakes@...cle.com"
<lorenzo.stoakes@...cle.com>, "Liam.Howlett@...cle.com"
<Liam.Howlett@...cle.com>, "vbabka@...e.cz" <vbabka@...e.cz>,
"rppt@...nel.org" <rppt@...nel.org>, "surenb@...gle.com" <surenb@...gle.com>,
"mhocko@...e.com" <mhocko@...e.com>, "jannh@...gle.com" <jannh@...gle.com>,
"pfalcato@...e.de" <pfalcato@...e.de>
CC: "linux-mm@...ck.org" <linux-mm@...ck.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "Wangkefeng (OS Kernel Lab)"
<wangkefeng.wang@...wei.com>, Sunnanyong <sunnanyong@...wei.com>
Subject: Re: [RFC PATCH 2/3] mm/mincore: Use can_pte_batch_count() in
mincore_pte_range() for pte batch mincore_pte_range()
> On 27.10.25 15:03, Zhang Qilong wrote:
> > In current mincore_pte_range(), if pte_batch_hint() return one pte,
> > it's not efficient, just call new added can_pte_batch_count().
> >
> > In ARM64 qemu, with 8 CPUs, 32G memory, a simple test demo like:
> > 1. mmap 1G anon memory
> > 2. write 1G data by 4k step
> > 3. mincore the mmaped 1G memory
> > 4. get the time consumed by mincore
> >
> > Tested the following cases:
> > - 4k, disabled all hugepage setting.
> > - 64k mTHP, only enable 64k hugepage setting.
> >
> > Before
> >
> > Case status | Consumed time (us) |
> > ----------------------------------|
> > 4k | 7356 |
> > 64k mTHP | 3670 |
> >
> > Pathed:
> >
> > Case status | Consumed time (us) |
> > ----------------------------------|
> > 4k | 4419 |
> > 64k mTHP | 3061 |
> >
>
> I assume you're only lucky in that benchmark because you got consecutive 4k
> pages / 64k mTHP from the buddy, right?
Year, the demo case is relatively simple, which may result in stronger continuity
of allocated physical page addresses.
This case primarily aims to validate optimization effectiveness in contiguous page
address. Maybe we also need watch side effectiveness in non-contiguous page
address.
>
> So I suspect that this will mostly just make a micro benchmark happy, because
> the reality where we allocate randomly over time, for the PCP, etc will look
> quite different.
>
> --
> Cheers
>
> David / dhildenb
>
Powered by blists - more mailing lists