netdev - Re: [RFC v2] net/core: add optional threading for rps backlog processing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e317d5bc-cc26-8b1b-ca4b-66b5328683c4@nbd.name>
Date:   Fri, 17 Feb 2023 14:40:38 +0100
From:   Felix Fietkau <nbd@....name>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [RFC v2] net/core: add optional threading for rps backlog
 processing

On 17.02.23 13:57, Eric Dumazet wrote:
> On Fri, Feb 17, 2023 at 1:35 PM Felix Fietkau <nbd@....name> wrote:
>>
>> On 17.02.23 13:23, Eric Dumazet wrote:
>> > On Fri, Feb 17, 2023 at 11:06 AM Felix Fietkau <nbd@....name> wrote:
>> >>
>> >> When dealing with few flows or an imbalance on CPU utilization, static RPS
>> >> CPU assignment can be too inflexible. Add support for enabling threaded NAPI
>> >> for RPS backlog processing in order to allow the scheduler to better balance
>> >> processing. This helps better spread the load across idle CPUs.
>> >>
>> >> Signed-off-by: Felix Fietkau <nbd@....name>
>> >> ---
>> >>
>> >> RFC v2:
>> >>  - fix rebase error in rps locking
>> >
>> > Why only deal with RPS ?
>> >
>> > It seems you propose the sofnet_data backlog be processed by a thread,
>> > instead than from softirq ?
>> Right. I originally wanted to mainly improve RPS, but my patch does
>> cover backlog in general. I will update the description in the next
>> version. Does the approach in general make sense to you?
>>
> 
> I do not know, this seems to lack some (perf) numbers, and
> descriptions of added max latencies and stuff like that :)
I just ran some test where I used a MT7621 device (dual-core 800 MHz
MIPS, 4 threads) as a router doing NAT without flow offloading.

Using the flent RRUL test between 2 PCs connected through the router,
I get these results:

rps_threaded=0: (combined CPU idle time around 27%)
                              avg       median       99th %          # data pts
  Ping (ms) ICMP   :        26.08        28.70        54.74 ms              199
  Ping (ms) UDP BE :         1.96        24.12        37.28 ms              200
  Ping (ms) UDP BK :         1.88        15.86        27.30 ms              200
  Ping (ms) UDP EF :         1.98        31.77        54.10 ms              200
  Ping (ms) avg    :         1.94          N/A          N/A ms              200
  TCP download BE  :        69.25        70.20       139.55 Mbits/s         200
  TCP download BK  :        95.15        92.51       163.93 Mbits/s         200
  TCP download CS5 :       133.64       129.10       292.46 Mbits/s         200
  TCP download EF  :       129.86       127.70       254.47 Mbits/s         200
  TCP download avg :       106.97          N/A          N/A Mbits/s         200
  TCP download sum :       427.90          N/A          N/A Mbits/s         200
  TCP totals       :       864.43          N/A          N/A Mbits/s         200
  TCP upload BE    :        97.54        96.67       163.99 Mbits/s         200
  TCP upload BK    :       139.76       143.88       190.37 Mbits/s         200
  TCP upload CS5   :        97.52        94.70       206.60 Mbits/s         200
  TCP upload EF    :       101.71       106.72       147.88 Mbits/s         200
  TCP upload avg   :       109.13          N/A          N/A Mbits/s         200
  TCP upload sum   :       436.53          N/A          N/A Mbits/s         200

rps_threaded=1: (combined CPU idle time around 16%)
                              avg       median       99th %          # data pts
  Ping (ms) ICMP   :        13.70        16.10        27.60 ms              199
  Ping (ms) UDP BE :         2.03        18.35        24.16 ms              200
  Ping (ms) UDP BK :         2.03        18.36        29.13 ms              200
  Ping (ms) UDP EF :         2.36        25.20        41.50 ms              200
  Ping (ms) avg    :         2.14          N/A          N/A ms              200
  TCP download BE  :       118.69       120.94       160.12 Mbits/s         200
  TCP download BK  :       134.67       137.81       177.14 Mbits/s         200
  TCP download CS5 :       126.15       127.81       174.84 Mbits/s         200
  TCP download EF  :        78.36        79.41       143.31 Mbits/s         200
  TCP download avg :       114.47          N/A          N/A Mbits/s         200
  TCP download sum :       457.87          N/A          N/A Mbits/s         200
  TCP totals       :       918.19          N/A          N/A Mbits/s         200
  TCP upload BE    :       112.20       111.55       164.38 Mbits/s         200
  TCP upload BK    :       144.99       139.24       205.12 Mbits/s         200
  TCP upload CS5   :        93.09        95.50       132.39 Mbits/s         200
  TCP upload EF    :       110.04       108.21       207.00 Mbits/s         200
  TCP upload avg   :       115.08          N/A          N/A Mbits/s         200
  TCP upload sum   :       460.32          N/A          N/A Mbits/s         200

As you can see, both throughput and latency improve because load can be
better distributed across CPU cores.

- Felix