lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Sun, 30 Aug 2020 10:46:09 +0200
From:   Sebastian Gottschall <>
To:     Jakub Kicinski <>, Felix Fietkau <>
Cc:, Eric Dumazet <>,
        Hillf Danton <>
Subject: Re: [PATCH v3 1/2] net: add support for threaded NAPI polling

Am 22.08.2020 um 03:49 schrieb Jakub Kicinski:
> On Fri, 21 Aug 2020 21:01:50 +0200 Felix Fietkau wrote:
>> For some drivers (especially 802.11 drivers), doing a lot of work in the NAPI
>> poll function does not perform well. Since NAPI poll is bound to the CPU it
>> was scheduled from, we can easily end up with a few very busy CPUs spending
>> most of their time in softirq/ksoftirqd and some idle ones.
>> Introduce threaded NAPI for such drivers based on a workqueue. The API is the
>> same except for using netif_threaded_napi_add instead of netif_napi_add.
>> In my tests with mt76 on MT7621 using threaded NAPI + a thread for tx scheduling
>> improves LAN->WLAN bridging throughput by 10-50%. Throughput without threaded
>> NAPI is wildly inconsistent, depending on the CPU that runs the tx scheduling
>> thread.
>> With threaded NAPI, throughput seems stable and consistent (and higher than
>> the best results I got without it).
>> Based on a patch by Hillf Danton
> I've tested this patch on a non-NUMA system with a moderately
> high-network workload (roughly 1:6 network to compute cycles)
> - and it provides ~2.5% speedup in terms of RPS but 1/6/10% worse
> P50/P99/P999 latency.
> I started working on a counter-proposal which uses a pool of threads
> dedicated to NAPI polling. It's not unlike the workqueue code but
> trying to be a little more clever. It gives me ~6.5% more RPS but at
> the same time lowers the p99 latency by 35% without impacting other
> percentiles. (I only started testing this afternoon, so hopefully the
> numbers will improve further).
> I'm happy for this patch to be merged, it's quite nice, but I wanted
> to give the heads up that I may have something that would replace it...
> The extremely rough PoC, less than half-implemented code which is really
> too broken to share:

looks interesting. keep going



Powered by blists - more mailing lists