lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 10 Sep 2022 04:36:34 +0800
From:   Jiebin Sun <jiebin.sun@...el.com>
To:     akpm@...ux-foundation.org, vasily.averin@...ux.dev,
        shakeelb@...gle.com, dennis@...nel.org, tj@...nel.org,
        cl@...ux.com, ebiederm@...ssion.com, legion@...nel.org,
        manfred@...orfullife.com, alexander.mikhalitsyn@...tuozzo.com,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Cc:     tim.c.chen@...el.com, feng.tang@...el.com, ying.huang@...el.com,
        tianyou.li@...el.com, wangyang.guo@...el.com, jiebin.sun@...el.com
Subject: [PATCH v5 0/2] ipc/msg: mitigate the lock contention in ipc/msg


Hi,

Here are two patches to mitigate the lock contention in ipc/msg.

The 1st patch is to add the new interface percpu_counter_add_local and
percpu_counter_sub_local. The batch size in percpu_counter_add_batch should
be very large in heavy writing and rare reading case. Add the "_local"
version, and mostly it will do local adding, reduce the global updating and
mitigate lock contention in writing.

The 2nd patch is to use percpu_counter instead of atomic update in ipc/msg.
The msg_bytes and msg_hdrs atomic counters are frequently updated when IPC
msg queue is in heavy use, causing heavy cache bounce and overhead. Change
them to percpu_counter greatly improve the performance. Since there is one
percpu struct per namespace, additional memory cost is minimal. Reading of
the count done in msgctl call, which is infrequent. So the need to sum up
the counts in each CPU is infrequent.

Changes in v5:
1. Use INT_MAX as the large batch size in percpu_counter_local_add and
percpu_counter_sub_local.
2. Use the latest kernel 6.0-rc4 as the baseline for performance test.
3. Move the percpu_counter_local_add and percpu_counter_sub_local from
percpu_counter.c to percpu_counter.h.

Changes in v3:
1. Add comment and change log for the new function percpu_counter_add_local.
Who should use it and who shouldn't.

Changes in v2:
1. Separate the original patch into two patches.
2. Add error handling for percpu_counter_init.

The performance gain increases as the threads of workload become larger.
Performance gain: 3.99x

CPU: ICX 8380 x 2 sockets
Core number: 40 x 2 physical cores
Benchmark: pts/stress-ng-1.4.0
-- system v message passing (160 threads)


Regards
Jiebin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ