[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <D6EDEBF1F91015459DB866AC4EE162CC023F84C9@IRSMSX103.ger.corp.intel.com>
Date: Tue, 7 Jun 2016 09:02:02 +0000
From: "Odzioba, Lukasz" <lukasz.odzioba@...el.com>
To: Michal Hocko <mhocko@...nel.org>,
"Hansen, Dave" <dave.hansen@...el.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"Shutemov, Kirill" <kirill.shutemov@...el.com>,
"Anaczkowski, Lukasz" <lukasz.anaczkowski@...el.com>
Subject: RE: mm: pages are not freed from lru_add_pvecs after process
termination
On Wed 05-11-16 09:53:00, Michal Hocko wrote:
> Yes I think this makes sense. The only case where it would be suboptimal
> is when the pagevec was already full and then we just created a single
> page pvec to drain it. This can be handled better though by:
>
> diff --git a/mm/swap.c b/mm/swap.c
> index 95916142fc46..3fe4f180e8bf 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -391,9 +391,8 @@ static void __lru_cache_add(struct page *page)
> struct pagevec *pvec = &get_cpu_var(lru_add_pvec);
>
> get_page(page);
>- if (!pagevec_space(pvec))
>+ if (!pagevec_add(pvec, page) || PageCompound(page))
> __pagevec_lru_add(pvec);
>- pagevec_add(pvec, page);
> put_cpu_var(lru_add_pvec);
>}
It's been a while, but I am back with some results.
For 2M i 4K pages I wrote simple app which mmaps and unmaps a lot of memory (60GB/288CPU) in parallel and does it ten times to get rid of some os/threading overhead.
Then I created an app which mixes pages in sort of pseudo random random way.
I executed those 10 times under "time" (once with THP=on and once with THP=off) command and calculated sum, min, max, avg of sys, real, user time which was necessary due to significant bias in results.
In overall it seems that this change has no negative impact on performance:
4K THP=on,off -> no significant change
2M THP=on,off -> it might be a tiny bit slower, but still close to measurement error
MIX THP=on,off -> no significant change
If you have any concerns about test correctness please let me know.
Below I added test applications and test results.
Thanks,
Lukas
------------------------------------------------------------------
//compile with: gcc bench.c -o bench_2M -fopenmp
//compile with: gcc -D SMALL_PAGES bench.c -o bench_4K -fopenmp
#include <stdio.h>
#include <sys/mman.h>
#include <omp.h>
#define MAP_HUGE_SHIFT 26
#define MAP_HUGE_2MB (21 << MAP_HUGE_SHIFT)
#ifndef SMALL_PAGES
#define PAGE_SIZE (1024*1024*2)
#define MAP_PARAM (MAP_HUGE_2MB)
#else
#define PAGE_SIZE (1024*4)
#define MAP_PARAM (0)
#endif
void main() {
size_t size = ((60 * 1000 * 1000) / 288) * 1000; // 60GBs of memory 288 CPUs
#pragma omp parallel
{
unsigned int k;
for (k = 0; k < 10; k++) {
void *p = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON | MAP_PARAM, -1, 0);
if (p != MAP_FAILED) {
char *cp = (char*)p;
size_t i;
for (i = 0; i < size / PAGE_SIZE; i++) {
*cp = 0;
cp += PAGE_SIZE;
}
munmap(p, size);
}
}
}
}
//compile with: gcc bench_mixed.c -o bench_mixed -fopenmp
#include <stdio.h>
#include <sys/mman.h>
#include <omp.h>
#define SMALL_PAGE (1024*4)
#define HUGE_PAGE (1024*4)
#define MAP_HUGE_SHIFT 26
#define MAP_HUGE_2MB (21 << MAP_HUGE_SHIFT)
void main() {
size_t size = ((60 * 1000 * 1000) / 288) * 1000; // 60GBs of memory 288 CPUs
#pragma omp parallel
{
unsigned int k, MAP_PARAM = 0;
unsigned int PAGE_SIZE = SMALL_PAGE;
for (k = 0; k < 10; k++) {
if ((k + omp_get_thread_num()) % 2) {
MAP_PARAM = MAP_HUGE_2MB;
PAGE_SIZE = HUGE_PAGE;
}
void *p = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON | MAP_PARAM, -1, 0);
if (p != MAP_FAILED) {
char *cp = (char*)p;
size_t i;
for (i = 0; i < size / PAGE_SIZE; i++) {
*cp = 0;
cp += PAGE_SIZE;
}
munmap(p, size);
}
}
}
}
*******************************
######### 4K THP=ON############
###real unpatched patched###
sum = 428.737s sum = 421.339s
min = 41.187s min = 41.492s
max = 44.948s max = 42.822s
avg = 42.874s avg = 42.134s
###user unpatched patched###
sum = 145.241s sum = 147.283s
min = 13.760s min = 14.418s
max = 15.532s max = 15.201s
avg = 14.524s avg = 14.728s
###sys unpatched patched###
sum = 4882.708s sum = 5020.581s
min = 441.922s min = 490.516s
max = 535.294s max = 532.137s
avg = 488.271s avg = 502.058s
######### 4K THP=OFF###########
###real unpatched patched###
sum = 2149.288s sum = 2144.336s
min = 214.589s min = 212.642s
max = 215.937s max = 215.579s
avg = 214.929s avg = 214.434s
###user unpatched patched###
sum = 858.659s sum = 858.166s
min = 81.655s min = 82.084s
max = 87.790s max = 88.649s
avg = 85.866s avg = 85.817s
###sys unpatched patched###
sum = 32357.867s sum = 31126.183s
min = 2952.685s min = 2783.157s
max = 3442.004s max = 3406.730s
avg = 3235.787s avg = 3112.618s
*******************************
######### 2K THP=ON############
###real unpatched patched###
sum = 497.032s sum = 500.115s
min = 48.840s min = 49.529s
max = 50.731s max = 50.698s
avg = 49.703s avg = 50.011s
###real unpatched patched###
sum = 56.536s sum = 59.286s
min = 5.021s min = 5.014s
max = 7.465s max = 8.865s
avg = 5.654s avg = 5.929s
###real unpatched patched###
sum = 4187.996s sum = 4450.088s
min = 391.334s min = 406.223s
max = 453.087s max = 530.787s
avg = 418.800s avg = 445.009s
######### 2K THP=OFF###########
###real unpatched patched###
sum = 54.698s sum = 53.383s
min = 5.196s min = 4.802s
max = 5.707s max = 5.639s
avg = 5.470s avg = 5.338s
###real unpatched patched###
sum = 55.567s sum = 60.980s
min = 4.625s min = 4.745s
max = 6.860s max = 6.727s
avg = 5.557s avg = 6.098s
###real unpatched patched###
sum = 215.267s sum = 215.924s
min = 21.194s min = 20.139s
max = 21.946s max = 22.724s
avg = 21.527s avg = 21.592s
*******************************
#######MIXED THP=OFF###########
###real unpatched patched###
sum = 2146.501s sum = 2145.591s
min = 211.727s min = 211.757s
max = 216.011s max = 215.340s
avg = 214.650s avg = 214.559s
###user unpatched patched###
sum = 895.243s sum = 909.778s
min = 87.540s min = 87.862s
max = 91.340s max = 94.337s
avg = 89.524s avg = 90.978s
###sys unpatched patched###
sum = 31916.377s sum = 30965.023s
min = 2988.592s min = 2878.047s
max = 3581.066s max = 3270.986s
avg = 3191.638s avg = 3096.502s
#######MIXED THP=ON###########
###real unpatched patched###
sum = 440.068s sum = 431.539s
min = 41.317s min = 41.860s
max = 58.752s max = 47.080s
avg = 44.007s avg = 43.154s
###user unpatched patched###
sum = 153.703s sum = 151.004s
min = 14.395s min = 14.210s
max = 16.778s max = 16.484s
avg = 15.370s avg = 15.100s
###sys unpatched patched###
sum = 4945.824s sum = 4957.661s
min = 459.862s min = 469.810s
max = 514.161s max = 526.257s
avg = 494.582s avg = 495.766s
Powered by blists - more mailing lists