lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Sun, 26 Jul 2020 20:24:00 +0200
From:   Felix Fietkau <nbd@....name>
To:     Eric Dumazet <erdnetdev@...il.com>,
        Eric Dumazet <eric.dumazet@...il.com>, netdev@...r.kernel.org
Cc:     Hillf Danton <hdanton@...a.com>
Subject: Re: [RFC] net: add support for threaded NAPI polling

On 2020-07-26 19:58, Eric Dumazet wrote:
> 
> 
> On 7/26/20 10:19 AM, Felix Fietkau wrote:
>> On 2020-07-26 18:49, Eric Dumazet wrote:
>>> On 7/26/20 9:31 AM, Felix Fietkau wrote:
>>>> For some drivers (especially 802.11 drivers), doing a lot of work in the NAPI
>>>> poll function does not perform well. Since NAPI poll is bound to the CPU it
>>>> was scheduled from, we can easily end up with a few very busy CPUs spending
>>>> most of their time in softirq/ksoftirqd and some idle ones.
>>>>
>>>> Introduce threaded NAPI for such drivers based on a workqueue. The API is the
>>>> same except for using netif_threaded_napi_add instead of netif_napi_add.
>>>>
>>>> In my tests with mt76 on MT7621 using threaded NAPI + a thread for tx scheduling
>>>> improves LAN->WLAN bridging throughput by 10-50%. Throughput without threaded
>>>> NAPI is wildly inconsistent, depending on the CPU that runs the tx scheduling
>>>> thread.
>>>>
>>>> With threaded NAPI, throughput seems stable and consistent (and higher than
>>>> the best results I got without it).
>>>
>>> Note that even with a threaded NAPI, you will not be able to use more than one cpu
>>> to process the traffic.
>> For a single threaded NAPI user that's correct. The main difference here
>> is that the CPU running the poll function does not have to be the same
>> as the CPU that scheduled it, and it can change based on the load.
>> That makes a huge difference in my tests.
> 
> This really looks like there is a problem in the driver itself.
> 
> Have you first checked that this patch was not hurting your use case ?
> 
> commit 4cd13c21b207e80ddb1144c576500098f2d5f882    softirq: Let ksoftirqd do its job
> 
> If yes, your proposal would again possibly hurt user space threads competing
> with a high priority workqueue, and packets would not be consumed by user applications.
> Having cpus burning 100% of cycles in kernel space is useless.
I already checked that a while back, and this patch is not hurting my
use case. I think it is actually helping, since I'm putting on enough
load to keep most softirq activity in ksoftirqd.

One thing to consider about my use case is that I'm bridging traffic
between an Ethernet interface and an 802.11 interface. Those packets do
not go through user space. If I push enough traffic, the ksoftirqd
instance running NAPI of the 802.11 device keeps the CPU 100% busy. I do
not have any signficant user space activity during my tests.

Since tx and rx NAPI are scheduled from the same IRQ which only fires on
one CPU, they end up in the same ksoftirqd instance.

Considering that one CPU is not enough to handle entire NAPI workload
for the device, threaded NAPI helps by letting other (otherwise mostly
idle) CPUs pick up some of the workload.

> It seems you need two cpus per queue, I guess this might be because
> you use a single NAPI for both tx and rx ?
> 
> Have you simply tried to use two NAPI, as some Ethernet drivers do ?
I'm already doing that.

> Do not get me wrong, but scheduling a thread only to process one packet at a time
> will hurt common cases.
> 
> Really I do not mind if you add a threaded NAPI, but it seems you missed
> a lot of NAPI requirements in the proposed patch.
> 
> For instance, many ->poll() handlers assume BH are disabled.
> 
> Also part of RPS logic depends on net_rx_action() calling net_rps_action_and_irq_enable()
I will look into that, thanks.

- Felix

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ