lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20170224114036.15621-1-aaron.lu@intel.com>
Date:   Fri, 24 Feb 2017 19:40:31 +0800
From:   Aaron Lu <aaron.lu@...el.com>
To:     linux-mm@...ck.org, linux-kernel@...r.kernel.org
Cc:     Dave Hansen <dave.hansen@...el.com>,
        Tim Chen <tim.c.chen@...el.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Ying Huang <ying.huang@...el.com>,
        Aaron Lu <aaron.lu@...el.com>
Subject: [PATCH 0/5] mm: support parallel free of memory

For regular processes, the time taken in its exit() path to free its
used memory is not a problem. But there are heavy ones that consume
several Terabytes memory and the time taken to free its memory could
last more than ten minutes.

To optimize this use case, a parallel free method is proposed here.
For detailed explanation, please refer to patch 2/5.

I'm not sure if we need patch 4/5 which can avoid page accumulation
being interrupted in some case(patch description has more information).
My test case, which only deal with anon memory doesn't get any help out
of this of course. It can be safely dropped if it is deemed not useful.

A test program that did a single malloc() of 320G memory is used to see
how useful the proposed parallel free solution is, the time calculated
is for the free() call. Test machine is a Haswell EX which has
4nodes/72cores/144threads with 512G memory. All tests are done with THP
disabled.

kernel                             time
v4.10                              10.8s  ±2.8%
this patch(with default setting)   5.795s ±5.8%

Patch 3/5 introduced a dedicated workqueue for the free workers and
here are more results when setting different values for max_active of
this workqueue:

max_active:   time
1             8.9s   ±0.5%
2             5.65s  ±5.5%
4             4.84s  ±0.16%
8             4.77s  ±0.97%
16            4.85s  ±0.77%
32            6.21s  ±0.46%

Comments are welcome.

Aaron Lu (5):
  mm: add tlb_flush_mmu_free_batches
  mm: parallel free pages
  mm: use a dedicated workqueue for the free workers
  mm: add force_free_pages in zap_pte_range
  mm: add debugfs interface for parallel free tuning

 include/asm-generic/tlb.h |  12 ++--
 mm/memory.c               | 138 +++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 122 insertions(+), 28 deletions(-)

-- 
2.9.3

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ