lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 22 Nov 2016 11:25:25 -0500
From:   Zi Yan <zi.yan@...t.com>
To:     linux-kernel@...r.kernel.org, linux-mm@...ck.org
Cc:     akpm@...ux-foundation.org, minchan@...nel.org, vbabka@...e.cz,
        mgorman@...hsingularity.net, kirill.shutemov@...ux.intel.com,
        n-horiguchi@...jp.nec.com, khandual@...ux.vnet.ibm.com,
        Zi Yan <zi.yan@...rutgers.edu>
Subject: [PATCH 0/5] Parallel hugepage migration optimization

From: Zi Yan <zi.yan@...rutgers.edu>

Hi all,

This patchset boosts the hugepage migration throughput and helps THP migration
which is added by Naoya's patches: https://lwn.net/Articles/705879/.

Motivation
===============================

In x86, 4KB page migrations are underutilizing the memory bandwidth compared
to 2MB THP migrations. I did some page migration benchmarking on a two-socket
Intel Xeon E5-2640v3 box, which has 23.4GB/s bandwidth, and discover
there are big throughput gap, ~3x, between 4KB and 2MB page migrations.

Here are the throughput numbers for different page sizes and page numbers:
        | 512 4KB pages | 1 2MB THP  |  1 4KB page
x86_64  |  0.98GB/s     |  2.97GB/s  |   0.06GB/s

As Linux currently use single-threaded page migration, the throughput is still
much lower than the hardware bandwidth, 2.97GB/s vs 23.4GB/s. So I parallelize
the copy_page() part of THP migration with workqueue and achieve 2.8x throughput.

Here are the throughput numbers of 2MB page migration:
           |  single-threaded   | 8-thread
x86_64 2MB |    2.97GB/s        | 8.58GB/s

Here is the benchmark you can use to compare page migration time:
https://github.com/x-y-z/thp-migration-bench

As this patchset requires Naoya's patch, this repo has both patchset applied:
https://github.com/x-y-z/linux-thp-migration/tree/page_migration_opt_upstream


Patchset desciption
===============================

This patchset adds a new migrate_mode MIGRATE_MT, which leads to parallelized
page migration routine. Only copy_huge_page() will be parallelized. This
MIGRATE_MT is enabled by a sysctl knob, vm.accel_page_copy, or an additional
flag, MPOL_MF_MOVE_MT, to move_pages() system call.

The parallelized copy page routine distributes a single huge page into 4 
workqueue threads and wait until they finish.

Discussion
===============================
1. For testing purpose, I choose to use sysctl to enable and disable the
parallel huge page migration. I need comments on how to enable and disable it,
or just enable it for all huge page migrations.

2. The hard-coded "4" workqueue threads is not adaptive, any suggestion?
Like boot time benchmark to find an appropriate number?

3. The parallel huge page migration works best with threads allocated at 
different physical cores, not all in the same hyper-threaded core. Is there
any way to find out the core topology easily?


Any comments are welcome. Thanks.

--
Best Regards,
Zi Yan


Zi Yan (5):
  mm: migrate: Add mode parameter to support additional page copy
    routines.
  mm: migrate: Change migrate_mode to support combination migration
    modes.
  migrate: Add copy_page_mt to use multi-threaded page migration.
  mm: migrate: Add copy_page_mt into migrate_pages.
  mm: migrate: Add vm.accel_page_copy in sysfs to control whether to use
    multi-threaded to accelerate page copy.

 fs/aio.c                       |  2 +-
 fs/hugetlbfs/inode.c           |  2 +-
 fs/ubifs/file.c                |  2 +-
 include/linux/highmem.h        |  2 +
 include/linux/migrate.h        |  6 ++-
 include/linux/migrate_mode.h   |  7 +--
 include/uapi/linux/mempolicy.h |  2 +
 kernel/sysctl.c                | 12 ++++++
 mm/Makefile                    |  2 +
 mm/compaction.c                | 20 ++++-----
 mm/copy_page.c                 | 96 ++++++++++++++++++++++++++++++++++++++++++
 mm/migrate.c                   | 61 ++++++++++++++++++---------
 12 files changed, 175 insertions(+), 39 deletions(-)
 create mode 100644 mm/copy_page.c

-- 
2.10.2

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ