lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 2 Jul 2021 16:40:47 +0800
From:   Yunsheng Lin <linyunsheng@...wei.com>
To:     Jason Wang <jasowang@...hat.com>, <davem@...emloft.net>,
        <kuba@...nel.org>, <mst@...hat.com>
CC:     <brouer@...hat.com>, <paulmck@...nel.org>, <peterz@...radead.org>,
        <will@...nel.org>, <shuah@...nel.org>,
        <linux-kernel@...r.kernel.org>, <netdev@...r.kernel.org>,
        <linux-kselftest@...r.kernel.org>, <linuxarm@...neuler.org>
Subject: Re: [Linuxarm] Re: [PATCH net-next v3 2/3] ptr_ring: move r->queue[]
 clearing after r->consumer_head updating

On 2021/7/2 14:45, Jason Wang wrote:
> 
> 在 2021/7/1 下午8:26, Yunsheng Lin 写道:
>> Currently r->queue[] clearing is done before r->consumer_head
>> updating, which makes the __ptr_ring_empty() returning false
>> positive result(the ring is non-empty, but __ptr_ring_empty()
>> suggest that it is empty) if the checking is done after the
>> r->queue clearing and before the consumer_head moving forward.
>>
>> Move the r->queue[] clearing after consumer_head moving forward
>> to avoid the above case.
>>
>> As a side effect of above change, a consumer_head checking is
>> avoided for the likely case, and it has noticeable performance
>> improvement when it is tested using the ptr_ring_test selftest
>> added in the previous patch.
>>
>> Tested using the "perf stat -r 1000 ./ptr_ring_test -s 1000 -m 1
>> -N 100000000", comparing the elapsed time:
>>
>>   arch     unpatched           patched       improvement
>> arm64    2.087205 sec       1.888224 sec      +9.5%
>>   X86      2.6538 sec         2.5422 sec       +4.2%
> 
> 
> I think we need the number of real workloads here.

As it is a low optimization, and overhead of enqueuing
and dequeuing is small for any real workloads, so the
performance improvement could be buried in deviation.
And that is why the ptr_ring_test is added, the about
10% improvement for arm64 seems big, but note that it
is tested using the taskset to avoid the numa effects
for arm64.

Anyway, here is the performance data for pktgen in
queue_xmit mode + dummy netdev with pfifo_fast(which
uses ptr_ring too), which is not obvious to the above
data:

 threads    unpatched        unpatched        delta
    1       3.21Mpps         3.23Mpps         +0.6%
    2       5.56Mpps         3.59Mpps         +0.5%
    4       5.58Mpps         5.61Mpps         +0.5%
    8       2.76Mpps         2.75Mpps         -0.3%
   16       2.23Mpps         2.22Mpps         -0.4%

> 
> Thanks
> 
> 
>>
>> Signed-off-by: Yunsheng Lin <linyunsheng@...wei.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ