lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240412092441.3112481-1-zhangpeng362@huawei.com>
Date: Fri, 12 Apr 2024 17:24:38 +0800
From: Peng Zhang <zhangpeng362@...wei.com>
To: <linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>
CC: <akpm@...ux-foundation.org>, <dennisszhou@...il.com>,
	<shakeelb@...gle.com>, <jack@...e.cz>, <surenb@...gle.com>,
	<kent.overstreet@...ux.dev>, <mhocko@...e.cz>, <vbabka@...e.cz>,
	<yuzhao@...gle.com>, <yu.ma@...el.com>, <wangkefeng.wang@...wei.com>,
	<sunnanyong@...wei.com>, <zhangpeng362@...wei.com>
Subject: [RFC PATCH 0/3] mm: convert mm's rss stats into lazy_percpu_counter

From: ZhangPeng <zhangpeng362@...wei.com>

Since commit f1a7941243c1 ("mm: convert mm's rss stats into
percpu_counter"), the rss_stats have converted into percpu_counter,
which convert the error margin from (nr_threads * 64) to approximately
(nr_cpus ^ 2). However, the new percpu allocation in mm_init() causes a
performance regression on fork/exec/shell. Even after commit
14ef95be6f55 ("kernel/fork: group allocation/free of per-cpu counters
for mm struct"), the performance of fork/exec/shell is still poor
compared to previous kernel versions.

To mitigate performance regression, we use lazy_percpu_counter[1] to
delay the allocation of percpu memory for rss_stats. After lmbench test,
we will get 3% ~ 6% performance improvement for lmbench
fork_proc/exec_proc/shell_proc after conversion.

The test results are as follows:

             base           base+revert        base+lazy_percpu_counter

fork_proc    427.4ms        394.1ms  (7.8%)    413.9ms  (3.2%)
exec_proc    2205.1ms       2042.2ms (7.4%)    2072.0ms (6.0%)
shell_proc   3180.9ms       2963.7ms (6.8%)    3010.7ms (5.4%)

This solution has not been fully evaluated and tested. The main idea of
this RFC patch series is to get the community's opinion on this approach.

[1] https://lore.kernel.org/linux-iommu/20230501165450.15352-8-surenb@google.com/

Kent Overstreet (1):
  Lazy percpu counters

ZhangPeng (2):
  lazy_percpu_counter: include struct percpu_counter in struct
    lazy_percpu_counter
  mm: convert mm's rss stats into lazy_percpu_counter

 include/linux/lazy-percpu-counter.h |  88 +++++++++++++++++++
 include/linux/mm.h                  |   8 +-
 include/linux/mm_types.h            |   4 +-
 include/trace/events/kmem.h         |   4 +-
 kernel/fork.c                       |  12 +--
 lib/Makefile                        |   2 +-
 lib/lazy-percpu-counter.c           | 131 ++++++++++++++++++++++++++++
 7 files changed, 232 insertions(+), 17 deletions(-)
 create mode 100644 include/linux/lazy-percpu-counter.h
 create mode 100644 lib/lazy-percpu-counter.c

-- 
2.25.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ