linux-kernel - [PATCH v3 0/2] ipc/msg: mitigate the lock contention in ipc/msg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20220906165430.851424-1-jiebin.sun@intel.com>
Date:   Wed,  7 Sep 2022 00:54:28 +0800
From:   Jiebin Sun <jiebin.sun@...el.com>
To:     akpm@...ux-foundation.org, vasily.averin@...ux.dev,
        shakeelb@...gle.com, dennis@...nel.org, tj@...nel.org,
        cl@...ux.com, ebiederm@...ssion.com, legion@...nel.org,
        manfred@...orfullife.com, alexander.mikhalitsyn@...tuozzo.com,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Cc:     tim.c.chen@...el.com, feng.tang@...el.com, ying.huang@...el.com,
        tianyou.li@...el.com, wangyang.guo@...el.com, jiebin.sun@...el.com
Subject: [PATCH v3 0/2] ipc/msg: mitigate the lock contention in ipc/msg 

Hi,

Here are two patches to mitigate the lock contention in ipc/msg.

The 1st patch is to add the new function percpu_counter_add_local if only update the local counter without aggregating to global counter. This function could be used with percpu_counter_sum together if you need high accurate counter. The combination could bring obvious performance improvement than percpu_counter_add_batch if percpu_counter_add is frequently called and percpu_counter_sum is not in the critical path.

The 2nd patch is to use percpu_counter instead of atomic update in ipc/msg.
The msg_bytes and msg_hdrs atomic counters are frequently updated when IPC msg queue is in heavy use, causing heavy cache bounce and overhead. Change them to percpu_counter greatly improve the performance. Since there is one percpu struct per namespace, additional memory cost is minimal. Reading of the count done in msgctl call, which is infrequent. So the need to sum up the counts in each CPU is infrequent.

Changes in v3:
1. Add comment and change log for the new function percpu_counter_add_local.
Who should use it and who shouldn't.

Changes in v2:
1. Separate the original patch into two patches.
2. Add error handling for percpu_counter_init.

The performance gain increases as the threads of workload become larger.
Performance gain: 3.38x

CPU: ICX 8380 x 2 sockets
Core number: 40 x 2 physical cores
Benchmark: pts/stress-ng-1.4.0
-- system v message passing (160 threads)

Regards
Jiebin