[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5cbf7d05-8427-9f92-7b79-e3ddcf420b18@nbd.name>
Date: Sun, 26 Jul 2020 20:24:00 +0200
From: Felix Fietkau <nbd@....name>
To: Eric Dumazet <erdnetdev@...il.com>,
Eric Dumazet <eric.dumazet@...il.com>, netdev@...r.kernel.org
Cc: Hillf Danton <hdanton@...a.com>
Subject: Re: [RFC] net: add support for threaded NAPI polling
On 2020-07-26 19:58, Eric Dumazet wrote:
>
>
> On 7/26/20 10:19 AM, Felix Fietkau wrote:
>> On 2020-07-26 18:49, Eric Dumazet wrote:
>>> On 7/26/20 9:31 AM, Felix Fietkau wrote:
>>>> For some drivers (especially 802.11 drivers), doing a lot of work in the NAPI
>>>> poll function does not perform well. Since NAPI poll is bound to the CPU it
>>>> was scheduled from, we can easily end up with a few very busy CPUs spending
>>>> most of their time in softirq/ksoftirqd and some idle ones.
>>>>
>>>> Introduce threaded NAPI for such drivers based on a workqueue. The API is the
>>>> same except for using netif_threaded_napi_add instead of netif_napi_add.
>>>>
>>>> In my tests with mt76 on MT7621 using threaded NAPI + a thread for tx scheduling
>>>> improves LAN->WLAN bridging throughput by 10-50%. Throughput without threaded
>>>> NAPI is wildly inconsistent, depending on the CPU that runs the tx scheduling
>>>> thread.
>>>>
>>>> With threaded NAPI, throughput seems stable and consistent (and higher than
>>>> the best results I got without it).
>>>
>>> Note that even with a threaded NAPI, you will not be able to use more than one cpu
>>> to process the traffic.
>> For a single threaded NAPI user that's correct. The main difference here
>> is that the CPU running the poll function does not have to be the same
>> as the CPU that scheduled it, and it can change based on the load.
>> That makes a huge difference in my tests.
>
> This really looks like there is a problem in the driver itself.
>
> Have you first checked that this patch was not hurting your use case ?
>
> commit 4cd13c21b207e80ddb1144c576500098f2d5f882 softirq: Let ksoftirqd do its job
>
> If yes, your proposal would again possibly hurt user space threads competing
> with a high priority workqueue, and packets would not be consumed by user applications.
> Having cpus burning 100% of cycles in kernel space is useless.
I already checked that a while back, and this patch is not hurting my
use case. I think it is actually helping, since I'm putting on enough
load to keep most softirq activity in ksoftirqd.
One thing to consider about my use case is that I'm bridging traffic
between an Ethernet interface and an 802.11 interface. Those packets do
not go through user space. If I push enough traffic, the ksoftirqd
instance running NAPI of the 802.11 device keeps the CPU 100% busy. I do
not have any signficant user space activity during my tests.
Since tx and rx NAPI are scheduled from the same IRQ which only fires on
one CPU, they end up in the same ksoftirqd instance.
Considering that one CPU is not enough to handle entire NAPI workload
for the device, threaded NAPI helps by letting other (otherwise mostly
idle) CPUs pick up some of the workload.
> It seems you need two cpus per queue, I guess this might be because
> you use a single NAPI for both tx and rx ?
>
> Have you simply tried to use two NAPI, as some Ethernet drivers do ?
I'm already doing that.
> Do not get me wrong, but scheduling a thread only to process one packet at a time
> will hurt common cases.
>
> Really I do not mind if you add a threaded NAPI, but it seems you missed
> a lot of NAPI requirements in the proposed patch.
>
> For instance, many ->poll() handlers assume BH are disabled.
>
> Also part of RPS logic depends on net_rx_action() calling net_rps_action_and_irq_enable()
I will look into that, thanks.
- Felix
Powered by blists - more mailing lists