[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241018063712.44028-1-lizhe.67@bytedance.com>
Date: Fri, 18 Oct 2024 14:37:12 +0800
From: lizhe.67@...edance.com
To: willy@...radead.org
Cc: akpm@...ux-foundation.org,
boqun.feng@...il.com,
linux-kernel@...r.kernel.org,
linux-mm@...ck.org,
lizhe.67@...edance.com,
longman@...hat.com,
mingo@...hat.com,
peterz@...radead.org,
will@...nel.org
Subject: Re: [RFC 2/2] khugepaged: use upgrade_read() to optimize collapse_huge_page
On Thu, 17 Oct 2024 14:20:12 +0100, willy@...radead.org wrote:
> On Thu, Oct 17, 2024 at 02:18:41PM +0800, lizhe.67@...edance.com wrote:
> > On Wed, 16 Oct 2024 12:53:15 +0100, willy@...radead.org wrote:
> >
> > >On Wed, Oct 16, 2024 at 12:36:00PM +0800, lizhe.67@...edance.com wrote:
> > >> From: Li Zhe <lizhe.67@...edance.com>
> > >>
> > >> In function collapse_huge_page(), we drop mmap read lock and get
> > >> mmap write lock to prevent most accesses to pagetables. There is
> > >> a small time window to allow other tasks to acquire the mmap lock.
> > >> With the use of upgrade_read(), we don't need to check vma and pmd
> > >> again in most cases.
> > >
> > >This is clearly a performance optimisation. So you must have some
> > >numebrs that justify this, please include them.
> >
> > Yes, I will add the relevant data to v2 patch.
>
> How about telling us all now so we know whether to continue discussing
> this?
In my test environment, function collapse_huge_page() only achieved a 0.25%
performance improvement. I use ftrace to get the execution time of
collapse_huge_page(). The test code and test command are as follows.
(1) Test result:
average execution time of collapse_huge_page()
before this patch: 1611.06283 us
after this patch: 1597.01474 us
(2) Test code:
#define MMAP_SIZE (2ul*1024*1024)
#define ALIGN(x, mask) (((x) + ((mask)-1)) & ~((mask)-1))
int main(void)
{
int num = 100;
size_t page_sz = getpagesize();
while (num--) {
size_t index;
unsigned char *p_map;
unsigned char *p_map_real;
p_map = (unsigned char *)mmap(0, 2 * MMAP_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
if (p_map == MAP_FAILED) {
printf("mmap fail\n");
return -1;
} else {
p_map_real = (char *)ALIGN((unsigned long)p_map, MMAP_SIZE);
printf("mmap get %p, align to %p\n", p_map, p_map_real);
}
for(index = 0; index < MMAP_SIZE; index += page_sz)
p_map_real[index] = 6;
int ret = madvise(p_map_real, MMAP_SIZE, 25);
printf("ret is %d\n", ret);
munmap(p_map, 2 * MMAP_SIZE);
}
return 0;
}
(3) Test command:
echo never > /sys/kernel/mm/transparent_hugepage/enabled
gcc test.c -o test
trace-cmd record -p function_graph -g collapse_huge_page --max-graph-depth 1 ./test
The optimization of the function collapse_huge_page() seems insignificant.
I am not sure whether it will have a more obvious optimization effect in
other scenarios.
Powered by blists - more mailing lists