linux-kernel - Re: [PATCH] ipc/msg.c: mitigate the lock contention with percpu counter

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a4d10f9e-ab8f-ffad-5eea-48316c436f41@intel.com>
Date:   Wed, 7 Sep 2022 17:39:47 +0800
From:   "Sun, Jiebin" <jiebin.sun@...el.com>
To:     Tim Chen <tim.c.chen@...ux.intel.com>,
        Shakeel Butt <shakeelb@...gle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>, vasily.averin@...ux.dev,
        Dennis Zhou <dennis@...nel.org>, Tejun Heo <tj@...nel.org>,
        Christoph Lameter <cl@...ux.com>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        Alexey Gladkov <legion@...nel.org>,
        Manfred Spraul <manfred@...orfullife.com>,
        alexander.mikhalitsyn@...tuozzo.com, Linux MM <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        "Chen, Tim C" <tim.c.chen@...el.com>,
        Feng Tang <feng.tang@...el.com>,
        Huang Ying <ying.huang@...el.com>, tianyou.li@...el.com,
        wangyang.guo@...el.com, jiebin.sun@...el.com
Subject: Re: [PATCH] ipc/msg.c: mitigate the lock contention with percpu
 counter


On 9/7/2022 2:44 AM, Tim Chen wrote:
> On Fri, 2022-09-02 at 09:27 -0700, Shakeel Butt wrote:
>> On Fri, Sep 2, 2022 at 12:04 AM Jiebin Sun <jiebin.sun@...el.com> wrote:
>>> The msg_bytes and msg_hdrs atomic counters are frequently
>>> updated when IPC msg queue is in heavy use, causing heavy
>>> cache bounce and overhead. Change them to percpu_counters
>>> greatly improve the performance. Since there is one unique
>>> ipc namespace, additional memory cost is minimal. Reading
>>> of the count done in msgctl call, which is infrequent. So
>>> the need to sum up the counts in each CPU is infrequent.
>>>
>>> Apply the patch and test the pts/stress-ng-1.4.0
>>> -- system v message passing (160 threads).
>>>
>>> Score gain: 3.38x
>>>
>>> CPU: ICX 8380 x 2 sockets
>>> Core number: 40 x 2 physical cores
>>> Benchmark: pts/stress-ng-1.4.0
>>> -- system v message passing (160 threads)
>>>
>>> Signed-off-by: Jiebin Sun <jiebin.sun@...el.com>
>> [...]
>>> +void percpu_counter_add_local(struct percpu_counter *fbc, s64 amount)
>>> +{
>>> +       this_cpu_add(*fbc->counters, amount);
>>> +}
>>> +EXPORT_SYMBOL(percpu_counter_add_local);
>> Why not percpu_counter_add()? This may drift the fbc->count more than
>> batch*nr_cpus. I am assuming that is not the issue for you as you
>> always do an expensive sum in the slow path. As Andrew asked, this
>> should be a separate patch.
> In the IPC case, the read is always done with the accurate read using
> percpu_counter_sum() gathering all the counts and
> never with percpu_counter_read() that only read global count.
> So Jiebin was not worry about accuracy.
>
> However, the counter is s64 and the local per cpu counter is S32.
> So the counter size has shrunk if we only keep the count in local per
> cpu counter, which can overflow a lot sooner and is not okay.
>
> Jiebin, can you try to use percpu_counter_add_batch, but using a large
> batch size.  That should achieve what you want without needing
> to create a percpu_counter_add_local() function, and also the overflow
> problem.
>
> Tim
>
I have sent out the patch v4 which use percpu_counter_add_batch. If we use
a tuned large batch size (1024), the performance gain is 3.17x (patch v4)
vs 3.38x (patch v3) previously in stress-ng -- message. It still has
significant performance improvement and also good balance between
performance gain and overflow issue.

Jiebin

>