[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
<CH0PR18MB4339E84983E30BCA8F6B52BCCD0CA@CH0PR18MB4339.namprd18.prod.outlook.com>
Date: Mon, 8 Sep 2025 11:15:06 +0000
From: Geethasowjanya Akula <gakula@...vell.com>
To: Venkat Venkatsubra <venkat.x.venkatsubra@...cle.com>,
netdev
<netdev@...r.kernel.org>
CC: Sunil Kovvuri Goutham <sgoutham@...vell.com>,
Subbaraya Sundeep Bhatta
<sbhatta@...vell.com>,
Hariprasad Kelam <hkelam@...vell.com>,
Bharat Bhushan
<bbhushan2@...vell.com>,
Elijah Craig <elijah.craig@...cle.com>,
Jeff Warren
<jeffrey.warren@...cle.com>
Subject: RE: octeontx2 (rvu_nicvf) NETDEV_TX_BUSY state handling
>-----Original Message-----
>From: Venkat Venkatsubra <venkat.x.venkatsubra@...cle.com>
>Sent: Sunday, September 7, 2025 7:13 AM
>To: netdev <netdev@...r.kernel.org>
>Cc: Sunil Kovvuri Goutham <sgoutham@...vell.com>; Geethasowjanya Akula
><gakula@...vell.com>; Subbaraya Sundeep Bhatta <sbhatta@...vell.com>;
>Hariprasad Kelam <hkelam@...vell.com>; Bharat Bhushan
><bbhushan2@...vell.com>; Elijah Craig <elijah.craig@...cle.com>; Jeff
>Warren <jeffrey.warren@...cle.com>
>Subject: [EXTERNAL] Fw: octeontx2 (rvu_nicvf) NETDEV_TX_BUSY state
>handling
>
>3rd Try...Sorry again.
>
>Thanks,
>Venkat
>
>________________________________________
>From: Venkat Venkatsubra <venkat.x.venkatsubra@...cle.com>
>Sent: Saturday, September 6, 2025 8:34 PM
>To: netdev <netdev@...r.kernel.org>
>Cc: Elijah Craig <elijah.craig@...cle.com>
>Subject: octeontx2 (rvu_nicvf) NETDEV_TX_BUSY state handling
>
>Hello All,
>
>Would you be able to help us understand the following behavior with
>octeontx2 driver ?
>
>otx2_sq_append_skb():
>
> /* Check if there is enough room between producer
> * and consumer index.
> */
> free_desc = otx2_get_free_sqe(sq);
> if (free_desc < sq->sqe_thresh)
> return false;
>
>We get into a situation where free_desc goes below sq->sqe_thresh.
>And remains stuck there. The reason for that is still under investigation.
Hi Venkat,
Thanks for reaching out. This issue appears to occur only when the NIC is unable to transmit packets from the Send Queues.
In this scenario, do you observe the packets being transmitted from the interfaces?
>
>The help we needed was with how that state is handled below.
>
>otx2vf_xmit():
>
> if (!otx2_sq_append_skb(vf, txq, sq, skb, qidx)) {
> netif_tx_stop_queue(txq);
>
> /* Check again, incase SQBs got freed up */
> smp_mb();
> if (((sq->num_sqbs - *sq->aura_fc_addr) * sq->sqe_per_sqb)
> > sq->sqe_thresh)
> netif_tx_wake_queue(txq);
>
> return NETDEV_TX_BUSY;
> }
>
>With ((sq->num_sqbs - *sq->aura_fc_addr) * sq->sqe_per_sqb) > sq-
>>sqe_thresh remaining true txq is kept awake and NETDEV_TX_BUSY returned.
>qdisc resends the packet again and the same sequence repeats (forever).
Such behavior should not occur unless the SQ is stuck — meaning packets are not being transmitted.
In that case, there won’t be enough free descriptors available to handle new packets.
Could you please share the SQ queue size configured in your setup?
>
>This gets us into
>i) high cpu usage by ksoftirqd
>ii) the tx timeout watchdog timer expiry doesn't trigger a NIC reset
> since txq continues to remain active.
>
>Pasting some values we had gathered with a trace in the hung state.
>
> otx2_sq_append_skb cons_head 0x890 head 0x6f4 sqe_cnt 0x1000 free_desc
>411 sqe_thresh 412 otx2_sq_append_skb num_sqbs 0x85 aura_fc_addr 0x2
>sqe_per_sqb 0x1f
>
>While you are there if you can assist us with the watchdog timer value that is
>chosen.
>
>/* Time to wait before watchdog kicks off */
>#define OTX2_TX_TIMEOUT (100 * HZ)
>
>Why is it kept so high compared to other drivers ?
>
>We encountered this problem with Oracle Linux.
>Looking at the latest upstream octeontx2 code it seemed to function the same
>way.
>
>We don't have a way to install the latest upstream kernel on the SmartNIC.
>Currently we hit this problem once every 2 weeks or even less.
>Pretty much random time it takes.
>
>Thanks for your help.
>
>Thanks,
>Venkat
Powered by blists - more mailing lists