netdev - Re: [PATCH net-next v5] net: axienet: Use NAPI for TX completion path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <f3868a5f9abe263f4ebebd21382cd022afa6a029.camel@calian.com>
Date:   Wed, 11 May 2022 17:16:55 +0000
From:   Robert Hancock <robert.hancock@...ian.com>
To:     "kuba@...nel.org" <kuba@...nel.org>
CC:     "pabeni@...hat.com" <pabeni@...hat.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "michal.simek@...inx.com" <michal.simek@...inx.com>,
        "radhey.shyam.pandey@...inx.com" <radhey.shyam.pandey@...inx.com>,
        "edumazet@...gle.com" <edumazet@...gle.com>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH net-next v5] net: axienet: Use NAPI for TX completion path

On Tue, 2022-05-10 at 18:56 -0700, Jakub Kicinski wrote:
> On Mon,  9 May 2022 11:30:39 -0600 Robert Hancock wrote:
> > This driver was using the TX IRQ handler to perform all TX completion
> > tasks. Under heavy TX network load, this can cause significant irqs-off
> > latencies (found to be in the hundreds of microseconds using ftrace).
> > This can cause other issues, such as overrunning serial UART FIFOs when
> > using high baud rates with limited UART FIFO sizes.
> > 
> > Switch to using a NAPI poll handler to perform the TX completion work
> > to get this out of hard IRQ context and avoid the IRQ latency impact.
> > A separate poll handler is used for TX and RX since they have separate
> > IRQs on this controller, so that the completion work for each of them
> > stays on the same CPU as the interrupt.
> > 
> > Testing on a Xilinx MPSoC ZU9EG platform using iperf3 from a Linux PC
> > through a switch at 1G link speed showed no significant change in TX or
> > RX throughput, with approximately 941 Mbps before and after. Hard IRQ
> > time in the TX throughput test was significantly reduced from 12% to
> > below 1% on the CPU handling TX interrupts, with total hard+soft IRQ CPU
> > usage dropping from about 56% down to 48%.
> > 
> > Signed-off-by: Robert Hancock <robert.hancock@...ian.com>
> > ---
> > 
> > Changed since v4: Added locking to protect TX ring tail pointer against
> > concurrent access by TX transmit and TX poll paths.
> 
> Hi, sorry for a late reply there's just too many patches to look at
> lately.
> 
> The lock is slightly concerning, the driver follows the usual wake up 
> scheme based on memory barriers. If we add the lock we should probably
> take the barriers out.

So there's basically two places where there is contention, axienet_start_xmit
where it is moving the tail pointer down after adding more entries to the TX
ring, and the TX poll function calling axienet_check_tx_bd_space where it is
using the tail pointer to see if there is enough space in the TX ring to wake
the queue. I suppose barriers are likely sufficient if the code updating the
ring pointer is more careful about how it is done - for example in the snippet
quoted below, it's moving the pointer down and then moving it back to 0 if it
is past the end of the ring; this would need to change to only update the
pointer once and not have the intermediate state where it is at an invalid
position.

I think the stability issue I saw earlier was not actually due to these changes
however, but to similar changes in v1 of the "net: macb: use NAPI for TX
completion path" patch. In the case of that driver, it was previously relying
on the TX completion path being protected by a spinlock in the IRQ handler,
which was lost when the TX completion was moved to a poll function.

> 
> We can also try to avoid the lock and drill into what the issue is.
> At a quick look it seems like there is a barrier missing between setup
> of the descriptors and kicking the transfer off:
> 
> diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
> b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
> index d6fc3f7acdf0..9e244b73a0ca 100644
> --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
> +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
> @@ -878,10 +878,11 @@ axienet_start_xmit(struct sk_buff *skb, struct
> net_device *ndev)
>  	cur_p->skb = skb;
>  
>  	tail_p = lp->tx_bd_p + sizeof(*lp->tx_bd_v) * lp->tx_bd_tail;
> -	/* Start the transfer */
> -	axienet_dma_out_addr(lp, XAXIDMA_TX_TDESC_OFFSET, tail_p);
>  	if (++lp->tx_bd_tail >= lp->tx_bd_num)
>  		lp->tx_bd_tail = 0;
> +	wmb(); // possibly dma_wmb()

I think the MMIO write in axienet_dma_out_addr is supposed to be an implicit
barrier, so that shouldn't be needed?

> +	/* Start the transfer */
> +	axienet_dma_out_addr(lp, XAXIDMA_TX_TDESC_OFFSET, tail_p);
>  
>  	/* Stop queue if next transmit may not have space */
>  	if (axienet_check_tx_bd_space(lp, MAX_SKB_FRAGS + 1)) {
-- 
Robert Hancock
Senior Hardware Designer, Calian Advanced Technologies
www.calian.com