lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 7 Jun 2016 09:02:02 +0000
From:	"Odzioba, Lukasz" <lukasz.odzioba@...el.com>
To:	Michal Hocko <mhocko@...nel.org>,
	"Hansen, Dave" <dave.hansen@...el.com>
CC:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"Shutemov, Kirill" <kirill.shutemov@...el.com>,
	"Anaczkowski, Lukasz" <lukasz.anaczkowski@...el.com>
Subject: RE: mm: pages are not freed from lru_add_pvecs after process
 termination

On Wed 05-11-16 09:53:00, Michal Hocko wrote:
> Yes I think this makes sense. The only case where it would be suboptimal
> is when the pagevec was already full and then we just created a single
> page pvec to drain it. This can be handled better though by:
>
> diff --git a/mm/swap.c b/mm/swap.c
> index 95916142fc46..3fe4f180e8bf 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -391,9 +391,8 @@ static void __lru_cache_add(struct page *page)
> 	struct pagevec *pvec = &get_cpu_var(lru_add_pvec);
> 
> 	get_page(page);
>-	if (!pagevec_space(pvec))
>+	if (!pagevec_add(pvec, page) || PageCompound(page))
> 		__pagevec_lru_add(pvec);
>-	pagevec_add(pvec, page);
> 	put_cpu_var(lru_add_pvec);
>}

It's been a while, but I am back with some results.
For 2M i 4K pages I wrote simple app which mmaps and unmaps a lot of memory (60GB/288CPU) in parallel and does it ten times to get rid of some os/threading overhead.
Then I created an app which mixes pages in sort of pseudo random random way.
I executed those 10 times under "time" (once with THP=on and once with THP=off) command and calculated sum, min, max, avg of sys, real, user time which was necessary due to significant bias in results.

In overall it seems that this change has no negative impact on performance:
4K  THP=on,off -> no significant change
2M  THP=on,off -> it might be a tiny bit slower, but still close to measurement error
MIX THP=on,off -> no significant change

If you have any concerns about test correctness please let me know.
Below I added test applications and test results.

Thanks,
Lukas
	
------------------------------------------------------------------

//compile with: gcc bench.c -o bench_2M -fopenmp
//compile with: gcc -D SMALL_PAGES bench.c -o bench_4K -fopenmp
#include <stdio.h>
#include <sys/mman.h>
#include <omp.h>

#define MAP_HUGE_SHIFT  26
#define MAP_HUGE_2MB    (21 << MAP_HUGE_SHIFT)

#ifndef SMALL_PAGES
#define PAGE_SIZE (1024*1024*2)
#define MAP_PARAM (MAP_HUGE_2MB)
#else
#define PAGE_SIZE (1024*4)
#define MAP_PARAM (0)
#endif

void main() {
        size_t size = ((60 * 1000 * 1000) / 288) * 1000; // 60GBs of memory 288 CPUs
        #pragma omp parallel
        {
        unsigned int k;
        for (k = 0; k < 10; k++) {
                void *p = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON | MAP_PARAM, -1, 0);
                        if (p != MAP_FAILED) {
                                char *cp = (char*)p;
                                size_t i;
                                for (i = 0; i < size / PAGE_SIZE; i++) {
                                        *cp = 0;
                                        cp += PAGE_SIZE;
                                }
                                munmap(p, size);
                        }
        }
        }
}

//compile with: gcc bench_mixed.c -o bench_mixed -fopenmp
#include <stdio.h>
#include <sys/mman.h>
#include <omp.h>
#define SMALL_PAGE (1024*4)
#define HUGE_PAGE (1024*4)
#define MAP_HUGE_SHIFT  26
#define MAP_HUGE_2MB    (21 << MAP_HUGE_SHIFT)
void main() {
        size_t size = ((60 * 1000 * 1000) / 288) * 1000; // 60GBs of memory 288 CPUs
        #pragma omp parallel
        {
        unsigned int k, MAP_PARAM = 0;
        unsigned int PAGE_SIZE = SMALL_PAGE;
        for (k = 0; k < 10; k++) {
                if ((k + omp_get_thread_num()) % 2) {
                        MAP_PARAM = MAP_HUGE_2MB;
                        PAGE_SIZE = HUGE_PAGE;
                }
                void *p = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON | MAP_PARAM, -1, 0);
                        if (p != MAP_FAILED) {
                                char *cp = (char*)p;
                                size_t i;
                                for (i = 0; i < size / PAGE_SIZE; i++) {
                                        *cp = 0;
                                        cp += PAGE_SIZE;
                                }
                                munmap(p, size);
                        }
        }
        }
}



*******************************

######### 4K THP=ON############
###real  unpatched   patched###
sum = 428.737s sum = 421.339s
min = 41.187s min = 41.492s
max = 44.948s max = 42.822s
avg = 42.874s avg = 42.134s

###user  unpatched   patched###
sum = 145.241s sum = 147.283s
min = 13.760s min = 14.418s
max = 15.532s max = 15.201s
avg = 14.524s avg = 14.728s

###sys  unpatched   patched###
sum = 4882.708s sum = 5020.581s
min = 441.922s min = 490.516s
max = 535.294s max = 532.137s
avg = 488.271s avg = 502.058s

######### 4K THP=OFF###########
###real  unpatched   patched###
sum = 2149.288s sum = 2144.336s
min = 214.589s min = 212.642s
max = 215.937s max = 215.579s
avg = 214.929s avg = 214.434s

###user  unpatched   patched###
sum = 858.659s sum = 858.166s
min = 81.655s min = 82.084s
max = 87.790s max = 88.649s
avg = 85.866s avg = 85.817s

###sys  unpatched   patched###
sum = 32357.867s sum = 31126.183s
min = 2952.685s min = 2783.157s
max = 3442.004s max = 3406.730s
avg = 3235.787s avg = 3112.618s

*******************************

######### 2K THP=ON############
###real  unpatched   patched###
sum = 497.032s sum = 500.115s
min = 48.840s min = 49.529s
max = 50.731s max = 50.698s
avg = 49.703s avg = 50.011s

###real  unpatched   patched###
sum = 56.536s sum = 59.286s
min = 5.021s min = 5.014s
max = 7.465s max = 8.865s
avg = 5.654s avg = 5.929s

###real  unpatched   patched###
sum = 4187.996s sum = 4450.088s
min = 391.334s min = 406.223s
max = 453.087s max = 530.787s
avg = 418.800s avg = 445.009s

######### 2K THP=OFF###########
###real  unpatched   patched###
sum = 54.698s sum = 53.383s
min = 5.196s min = 4.802s
max = 5.707s max = 5.639s
avg = 5.470s avg = 5.338s

###real  unpatched   patched###
sum = 55.567s sum = 60.980s
min = 4.625s min = 4.745s
max = 6.860s max = 6.727s
avg = 5.557s avg = 6.098s

###real  unpatched   patched###
sum = 215.267s sum = 215.924s
min = 21.194s min = 20.139s
max = 21.946s max = 22.724s
avg = 21.527s avg = 21.592s

*******************************

#######MIXED THP=OFF###########
###real  unpatched   patched###
sum = 2146.501s sum = 2145.591s
min = 211.727s min = 211.757s
max = 216.011s max = 215.340s
avg = 214.650s avg = 214.559s

###user  unpatched   patched###
sum = 895.243s sum = 909.778s
min = 87.540s min = 87.862s
max = 91.340s max = 94.337s
avg = 89.524s avg = 90.978s

###sys  unpatched   patched###
sum = 31916.377s sum = 30965.023s
min = 2988.592s min = 2878.047s
max = 3581.066s max = 3270.986s
avg = 3191.638s avg = 3096.502s
#######MIXED THP=ON###########
###real  unpatched   patched###
sum = 440.068s sum = 431.539s
min = 41.317s min = 41.860s
max = 58.752s max = 47.080s
avg = 44.007s avg = 43.154s

###user  unpatched   patched###
sum = 153.703s sum = 151.004s
min = 14.395s min = 14.210s
max = 16.778s max = 16.484s
avg = 15.370s avg = 15.100s

###sys  unpatched   patched###
sum = 4945.824s sum = 4957.661s
min = 459.862s min = 469.810s
max = 514.161s max = 526.257s
avg = 494.582s avg = 495.766s


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ