netdev - Re: [PATCH net 0/2] sfc: fallback for lack of xdp tx queues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d36c51ae-1832-effa-ee5c-fdebdeec5c5c@redhat.com>
Date:   Thu, 9 Sep 2021 13:49:55 +0200
From:   Jesper Dangaard Brouer <jbrouer@...hat.com>
To:     Íñigo Huguet <ihuguet@...hat.com>,
        ecree.xilinx@...il.com, habetsm.xilinx@...il.com,
        davem@...emloft.net, kuba@...nel.org,
        Ivan Babrou <ivan@...udflare.com>,
        Marek Majkowski <marek@...udflare.com>,
        Jakub Sitnicki <jakub@...udflare.com>,
        Toke Hoiland Jorgensen <toke@...hat.com>,
        Freysteinn Alfredsson <Freysteinn.Alfredsson@....se>
Cc:     brouer@...hat.com, ast@...nel.org, daniel@...earbox.net,
        hawk@...nel.org, john.fastabend@...il.com, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [PATCH net 0/2] sfc: fallback for lack of xdp tx queues


Great work Huguet, patches LGTM, I would ACK but they have already been 
applied:

Here is the summary with links:
   - [net,1/2] sfc: fallback for lack of xdp tx queues
     https://git.kernel.org/netdev/net/c/415446185b93
   - [net,2/2] sfc: last resort fallback for lack of xdp tx queues
     https://git.kernel.org/netdev/net/c/6215b608a8c4

Cloudflare (cc) heads-up for these improvements.

And heads-up to Toke and Frey on patch 2/2, as it creates push-back via 
TX queue stop/restart logic (see kernel API netif_tx_queue_stopped).
XDP currently doesn't handle this well, but I hope to see XDP queueing 
work from your side to improve the situation ;-)


On 09/09/2021 11.28, Íñigo Huguet wrote:
> If there are not enough hardware resources to allocate one tx queue per
> CPU for XDP, XDP_TX and XDP_REDIRECT actions were unavailable, and using
> them resulted each time with the packet being drop and this message in
> the logs: XDP TX failed (-22)
> 
> These patches implement 2 fallback solutions for 2 different situations
> that might happen:
> 1. There are not enough free resources for all the tx queues, but there
>     are some free resources available
> 2. There are not enough free resources at all for tx queues.
> 
> Both solutions are based in sharing tx queues, using __netif_tx_lock for
> synchronization. In the second case, as there are not XDP TX queues to
> share, network stack queues are used instead, but since we're taking
> __netif_tx_lock, concurrent access to the queues is correctly protected.
> 
> The solution for this second case might affect performance both of XDP
> traffic and normal traffice due to lock contention if both are used
> intensively. That's why I call it a "last resort" fallback: it's not a
> desirable situation, but at least we have XDP TX working.
> 
> Some tests has shown good results and indicate that the non-fallback
> case is not being damaged by this changes. They are also promising for
> the fallback cases. This is the test:
> 1. From another machine, send high amount of packets with pktgen, script
>     samples/pktgen/pktgen_sample04_many_flows.sh
> 2. In the tested machine, run samples/bpf/xdp_rxq_info with arguments
>     "-a XDP_TX --swapmac" and see the results
> 3. In the tested machine, run also pktgen_sample04 to create high TX
>     normal traffic, and see how xdp_rxq_info results vary
> 
> Note that this test doesn't check the worst situations for the fallback
> solutions because XDP_TX will only be executed from the same CPUs that
> are processed by sfc, and not from every CPU in the system, so the
> performance drop due to the highest locking contention doesn't happen.
> I'd like to test that, as well, but I don't have access right now to a
> proper environment.
> 
> Test results:
> 
> Without doing TX:
> Before changes: ~2,900,000 pps
> After changes, 1 queues/core: ~2,900,000 pps
> After changes, 2 queues/core: ~2,900,000 pps
> After changes, 8 queues/core: ~2,900,000 pps
> After changes, borrowing from network stack: ~2,900,000 pps
> 
> With multiflow TX at the same time:
> Before changes: ~1,700,000 - 2,900,000 pps
> After changes, 1 queues/core: ~1,700,000 - 2,900,000 pps
> After changes, 2 queues/core: ~1,700,000 pps
> After changes, 8 queues/core: ~1,700,000 pps
> After changes, borrowing from network stack: 1,150,000 pps
> 
> Sporadic "XDP TX failed (-5)" warnings are shown when running xdp program
> and pktgen simultaneously. This was expected because XDP doesn't have any
> buffering system if the NIC is under very high pressure. Thousands of
> these warnings are shown in the case of borrowing net stack queues. As I
> said before, this was also expected.
> 
> 
> Íñigo Huguet (2):
>    sfc: fallback for lack of xdp tx queues
>    sfc: last resort fallback for lack of xdp tx queues
> 
>   drivers/net/ethernet/sfc/efx_channels.c | 98 ++++++++++++++++++-------
>   drivers/net/ethernet/sfc/net_driver.h   |  8 ++
>   drivers/net/ethernet/sfc/tx.c           | 29 ++++++--
>   3 files changed, 99 insertions(+), 36 deletions(-)
>