lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20210521102826.28552-1-mgorman@techsingularity.net>
Date:   Fri, 21 May 2021 11:28:20 +0100
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     Linux-MM <linux-mm@...ck.org>
Cc:     Dave Hansen <dave.hansen@...ux.intel.com>,
        Matthew Wilcox <willy@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Michal Hocko <mhocko@...nel.org>,
        Nicholas Piggin <npiggin@...il.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Mel Gorman <mgorman@...hsingularity.net>
Subject: [RFC PATCH 0/6] Calculate pcp->high based on zone sizes and active CPUs

The per-cpu page allocator (PCP) is meant to reduce contention on the zone
lock but the sizing of batch and high is archaic and neither takes the zone
size into account or the number of CPUs local to a zone. Furthermore, the
fact that vm.percpu_pagelist_fraction adjusts both batch and high values
means that the sysctl can reduce zone lock contention but also increase
allocation latencies.

This series disassociates pcp->high from pcp->batch and then scales
pcp->high based on the size of the local zone with limited impact to
reclaim and accounting for active CPUs but leaves pcp->batch static.
It also adapts the number of pages that can be on the pcp list based on
recent freeing patterns.

The motivation is partially to adjust to larger memory sizes but
is also driven by the fact that large batches of page freeing via
release_pages() often shows zone contention as a major part of the
problem. Another is a bug report based on an older kernel where a
multi-terabyte process can takes several minutes to exit. A workaround
was to use vm.percpu_pagelist_fraction to increase the pcp->high value
but testing indicated that a production workload could not use the same
values because of an increase in allocation latencies. Unfortunately,
I cannot reproduce this test case myself as the multi-terabyte machines
are in active use but it should alleviate the problem.

The series aims to address both and partially acts as a pre-requisite. pcp
only works with order-0 which is useless for SLUB (when using high orders)
and THP (unconditionally). To store high-order pages on PCP, the pcp->high
values need to be increased first.

 Documentation/admin-guide/sysctl/vm.rst |  19 +--
 include/linux/cpuhotplug.h              |   2 +-
 include/linux/mmzone.h                  |   8 +-
 kernel/sysctl.c                         |   8 +-
 mm/internal.h                           |   2 +-
 mm/memory_hotplug.c                     |   4 +-
 mm/page_alloc.c                         | 166 +++++++++++++++++-------
 mm/vmscan.c                             |  35 +++++
 8 files changed, 179 insertions(+), 65 deletions(-)

-- 
2.26.2

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ