linux-kernel - Re: [PATCH] ipc/msg.c: mitigate the lock contention with percpu counter

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <da91f763-b74b-68d9-312b-1bc86179273f@intel.com>
Date:   Mon, 5 Sep 2022 19:54:35 +0800
From:   "Sun, Jiebin" <jiebin.sun@...el.com>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     vasily.averin@...ux.dev, shakeelb@...gle.com, dennis@...nel.org,
        tj@...nel.org, cl@...ux.com, ebiederm@...ssion.com,
        legion@...nel.org, manfred@...orfullife.com,
        alexander.mikhalitsyn@...tuozzo.com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, tim.c.chen@...el.com,
        feng.tang@...el.com, ying.huang@...el.com, tianyou.li@...el.com,
        wangyang.guo@...el.com
Subject: Re: [PATCH] ipc/msg.c: mitigate the lock contention with percpu
 counter


On 9/3/2022 12:06 AM, Andrew Morton wrote:
> On Fri,  2 Sep 2022 23:22:43 +0800 Jiebin Sun <jiebin.sun@...el.com> wrote:
>
>> The msg_bytes and msg_hdrs atomic counters are frequently
>> updated when IPC msg queue is in heavy use, causing heavy
>> cache bounce and overhead. Change them to percpu_counters
>> greatly improve the performance. Since there is one unique
>> ipc namespace, additional memory cost is minimal. Reading
>> of the count done in msgctl call, which is infrequent. So
>> the need to sum up the counts in each CPU is infrequent.
>>
>> Apply the patch and test the pts/stress-ng-1.4.0
>> -- system v message passing (160 threads).
>>
>> Score gain: 3.38x
> So this test became 3x faster?

Yes. It is from the phoronix test suite stress-ng-1.4.0 -- system v message
passing with dual sockets ICX servers. In this benchmark, there are 160
pairs of threads, which do msgsnd and msgrcv. The patch benefit more as the
threads of workload increase.

>
>> CPU: ICX 8380 x 2 sockets
>> Core number: 40 x 2 physical cores
>> Benchmark: pts/stress-ng-1.4.0
>> -- system v message passing (160 threads)
>>
>> ...
>>
>> @@ -138,6 +139,14 @@ percpu_counter_add(struct percpu_counter *fbc, s64 amount)
>>   	preempt_enable();
>>   }
>>   
>> +static inline void
>> +percpu_counter_add_local(struct percpu_counter *fbc, s64 amount)
>> +{
>> +	preempt_disable();
>> +	fbc->count += amount;
>> +	preempt_enable();
>> +}
> What's this and why is it added?
>
> It would be best to propose this as a separate preparatory patch.
> Fully changelogged and perhaps even with a code comment explaining why
> and when it should be used.
>
> Thanks.

As it will always do sum in msgctl_info, there is no need to use
percpu_counter_add_batch. It will do global updating when the counter reach
the batch size. So we add percpu_counter_add_local for smp and non_smp,
which will only do local adding to the percpu counter.
I have separate the original patch into two patches.

Thanks.