[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bd01e1544e388eb71b8713e94ea2165d1a805b54.camel@codeconstruct.com.au>
Date: Fri, 17 Nov 2023 15:29:05 +0800
From: Jeremy Kerr <jk@...econstruct.com.au>
To: Jinliang Wang <jinliangw@...gle.com>,
Matt Johnston <matt@...econstruct.com.au>
Cc: William Kennington <wak@...gle.com>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mctp-i2c: increase the MCTP_I2C_TX_WORK_LEN to 500
Hi Jinliang,
> Tested:
> Before the fix, we will see below message in kernel log when
> concurrently sending namespace create commands to the 4 NVMe-MI
> devices on the same i2c bus:
> kernel: i2c i2c-6 mctpi2c6: BUG! Tx Ring full when queue awake!
>
> After the fix, the error message is gone.
Thanks for the report, but I don't think this is the correct fix: you
should not hit that error even if > TX_WORK_LEN packets need to be sent.
The net core should not be attempting to queue more skbs after
netif_stop_queue(), which we do in the conditional below the warning:
spin_lock_irqsave(&midev->tx_queue.lock, flags);
if (skb_queue_len(&midev->tx_queue) >= MCTP_I2C_TX_WORK_LEN) {
netif_stop_queue(dev);
spin_unlock_irqrestore(&midev->tx_queue.lock, flags);
netdev_err(dev, "BUG! Tx Ring full when queue awake!\n");
return NETDEV_TX_BUSY;
}
__skb_queue_tail(&midev->tx_queue, skb);
if (skb_queue_len(&midev->tx_queue) == MCTP_I2C_TX_WORK_LEN)
netif_stop_queue(dev);
spin_unlock_irqrestore(&midev->tx_queue.lock, flags);
What looks like has happened here:
1) we have TX_WORK_LEN-1 packets queued
2) we release a flow, which queues the "marker" skb. the tx_queue now
has TX_WORK_LEN items
3) we queue another packet, ending up with TX_WORK_LEN+1 in the queue
4) the == TX_WORK_LEN test fails, so we dont do a netif_stop_queue()
A couple of potential fixes:
* We do the check and conditional netif_stop_queue() in (2)
* We change the check there to be `>= MCTP_I2C_TX_WORK_LEN`
Matt, any preferences?
Cheers,
Jeremy
Powered by blists - more mailing lists