lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 22 Jul 2020 14:27:42 +0200
From:   Felix Fietkau <nbd@....name>
To:     Rajkumar Manoharan <rmanohar@...eaurora.org>,
        Rakesh Pillai <pillair@...eaurora.org>
Cc:     ath10k@...ts.infradead.org, linux-wireless@...r.kernel.org,
        linux-kernel@...r.kernel.org, kvalo@...eaurora.org,
        johannes@...solutions.net, davem@...emloft.net, kuba@...nel.org,
        netdev@...r.kernel.org, dianders@...omium.org, evgreen@...omium.org
Subject: Re: [RFC 2/7] ath10k: Add support to process rx packet in thread

On 2020-07-21 23:53, Rajkumar Manoharan wrote:
> On 2020-07-21 10:14, Rakesh Pillai wrote:
>> NAPI instance gets scheduled on a CPU core on which
>> the IRQ was triggered. The processing of rx packets
>> can be CPU intensive and since NAPI cannot be moved
>> to a different CPU core, to get better performance,
>> its better to move the gist of rx packet processing
>> in a high priority thread.
>> 
>> Add the init/deinit part for a thread to process the
>> receive packets.
>> 
> IMHO this defeat the whole purpose of NAPI. Originally in ath10k
> irq processing happened in tasklet (high priority) context which in
> turn push more data to net core even though net is unable to process
> driver data as both happen in different context (fast producer - slow 
> consumer)
> issue. Why can't CPU governor schedule the interrupts in less loaded CPU 
> core?
> Otherwise you can play with different RPS and affinity settings to meet 
> your
> requirement.
> 
> IMO introducing high priority tasklets/threads is not viable solution.
I'm beginning to think that the main problem with NAPI here is that the
work done by poll functions on 802.11 drivers is significantly more CPU
intensive compared to ethernet drivers, possibly more than what NAPI was
designed for.

I'm considering testing a different approach (with mt76 initially):
- Add a mac80211 rx function that puts processed skbs into a list
instead of handing them to the network stack directly.
- Move all rx processing to a high priority thread, keep a driver
internal queue for fully processed packets.
- Schedule NAPI poll on completion.
- NAPI poll function pulls from the internal queue and passes to the
network stack.

With this approach, the network stack retains some control over the
processing rate of rx packets, while the scheduler can move the CPU
intensive processing around to where it fits best.

What do you think?

- Felix

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ