lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241018063712.44028-1-lizhe.67@bytedance.com>
Date: Fri, 18 Oct 2024 14:37:12 +0800
From: lizhe.67@...edance.com
To: willy@...radead.org
Cc: akpm@...ux-foundation.org,
	boqun.feng@...il.com,
	linux-kernel@...r.kernel.org,
	linux-mm@...ck.org,
	lizhe.67@...edance.com,
	longman@...hat.com,
	mingo@...hat.com,
	peterz@...radead.org,
	will@...nel.org
Subject: Re: [RFC 2/2] khugepaged: use upgrade_read() to optimize collapse_huge_page

On Thu, 17 Oct 2024 14:20:12 +0100, willy@...radead.org wrote:

> On Thu, Oct 17, 2024 at 02:18:41PM +0800, lizhe.67@...edance.com wrote:
> > On Wed, 16 Oct 2024 12:53:15 +0100, willy@...radead.org wrote:
> > 
> > >On Wed, Oct 16, 2024 at 12:36:00PM +0800, lizhe.67@...edance.com wrote:
> > >> From: Li Zhe <lizhe.67@...edance.com>
> > >> 
> > >> In function collapse_huge_page(), we drop mmap read lock and get
> > >> mmap write lock to prevent most accesses to pagetables. There is
> > >> a small time window to allow other tasks to acquire the mmap lock.
> > >> With the use of upgrade_read(), we don't need to check vma and pmd
> > >> again in most cases.
> > >
> > >This is clearly a performance optimisation.  So you must have some
> > >numebrs that justify this, please include them.
> > 
> > Yes, I will add the relevant data to v2 patch.
> 
> How about telling us all now so we know whether to continue discussing
> this?

In my test environment, function collapse_huge_page() only achieved a 0.25%
performance improvement. I use ftrace to get the execution time of
collapse_huge_page(). The test code and test command are as follows.

(1) Test result:

			average execution time of collapse_huge_page()
before this patch: 		1611.06283 us
after this patch:               1597.01474 us

(2) Test code:

#define MMAP_SIZE (2ul*1024*1024)
#define ALIGN(x, mask)  (((x) + ((mask)-1)) & ~((mask)-1))
int main(void)
{
	int num = 100;
	size_t page_sz = getpagesize();
	while (num--) {
		size_t index;
		unsigned char *p_map;
		unsigned char *p_map_real;
		p_map = (unsigned char *)mmap(0, 2 * MMAP_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0); 
		if (p_map == MAP_FAILED) { 
			printf("mmap fail\n"); 
			return -1;
		} else {
			p_map_real = (char *)ALIGN((unsigned long)p_map, MMAP_SIZE);
			printf("mmap get %p, align to %p\n", p_map, p_map_real);
		}
		for(index = 0; index < MMAP_SIZE; index += page_sz)
			p_map_real[index] = 6;
		int ret = madvise(p_map_real, MMAP_SIZE, 25);
		printf("ret is %d\n", ret);
		munmap(p_map, 2 * MMAP_SIZE); 
	}
	return 0;
}

(3) Test command:
echo never > /sys/kernel/mm/transparent_hugepage/enabled
gcc test.c -o test
trace-cmd record -p function_graph -g collapse_huge_page --max-graph-depth 1 ./test

The optimization of the function collapse_huge_page() seems insignificant.
I am not sure whether it will have a more obvious optimization effect in
other scenarios.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ