lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 25 Mar 2021 11:33:53 +0100
From:   Felix Fietkau <nbd@....name>
To:     Rakesh Pillai <pillair@...eaurora.org>,
        'Ben Greear' <greearb@...delatech.com>,
        'Brian Norris' <briannorris@...omium.org>
Cc:     'Johannes Berg' <johannes@...solutions.net>,
        'Rajkumar Manoharan' <rmanohar@...eaurora.org>,
        'ath10k' <ath10k@...ts.infradead.org>,
        'linux-wireless' <linux-wireless@...r.kernel.org>,
        'Linux Kernel' <linux-kernel@...r.kernel.org>,
        'Kalle Valo' <kvalo@...eaurora.org>,
        "'David S. Miller'" <davem@...emloft.net>,
        'Jakub Kicinski' <kuba@...nel.org>, netdev@...r.kernel.org,
        'Doug Anderson' <dianders@...omium.org>,
        'Evan Green' <evgreen@...omium.org>
Subject: Re: [RFC 2/7] ath10k: Add support to process rx packet in thread


On 2021-03-25 10:45, Rakesh Pillai wrote:
> Hi Felix / Ben,
> 
> In case of ath10k (snoc based targets), we have a lot of processing in the NAPI context.
> Even moving this to threaded NAPI is not helping much due to the load.
> 
> Breaking the tasks into multiple context (with the patch series I posted) is helping in improving the throughput.
> With the current rx_thread based approach, the rx processing is broken into two parallel contexts
> 1) reaping the packets from the HW
> 2) processing these packets list and handing it over to mac80211 (and later to the network stack)
> 
> This is the primary reason for choosing the rx thread approach.
Have you considered the possibility that maybe the problem is that the
driver doing too much work?
One example is that you could take advantage of the new 802.3 decap
offload to simplify rx processing. Worked for me on mt76 where a
dual-core 1.3 GHz A64 can easily handle >1.8 Gbps local TCP rx on a
single card, without the rx NAPI thread being the biggest consumer of
CPU cycles.

And if you can't do that and still consider all of the metric tons of
processing work necessary, you could still do this:
On interrupts, spawn a processing thread that traverses the ring and
does the preparation work (instead of NAPI).
>From that thread you schedule the threaded NAPI handler that processes
these packets further and hands them to mac80211.
To keep the load somewhat balanced, you can limit the number of
pre-processed packets in the ring.

- Felix

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ