[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <277ca0b3-32df-532f-e82f-a7d42dce54ab@linux.vnet.ibm.com>
Date: Mon, 26 Jun 2023 12:06:10 -0700
From: Rick Lindsley <ricklind@...ux.vnet.ibm.com>
To: Jakub Kicinski <kuba@...nel.org>, Nick Child <nnac123@...ux.ibm.com>
Cc: netdev@...r.kernel.org, haren@...ux.ibm.com, ricklind@...ibm.com
Subject: RE: [PATCH net] ibmvnic: Do not reset dql stats on NON_FATAL err
On 6/24/23 15:19, Jakub Kicinski wrote:
> Why are you trying to clear this bit?
>
> If the completions will still come the bit will be cleared (or not)
> during completion handling (netdev_tx_completed_queue() et al.)
>
> Drivers shouldn't be poking into queue state bits directly.
+ if (adapter->reset_reason == VNIC_RESET_NON_FATAL)
+ clear_bit(__QUEUE_STATE_STACK_XOFF,
+ &netdev_get_tx_queue(netdev, i)->state);
+ else
+ netdev_tx_reset_queue(netdev_get_tx_queue(netdev, i));
The problem is the call to dql_reset() in netdev_tx_reset_queue(). If we reset dql stats, we risk num_completed being greater than num_queued, which will cause a BUG_ON to fire in dql_completed().
static inline void netdev_tx_reset_queue(struct netdev_queue *q)
{
#ifdef CONFIG_BQL
clear_bit(__QUEUE_STATE_STACK_XOFF, &q->state);
dql_reset(&q->dql);
#endif
}
Existing code calls netdev_tx_reset_queue() unconditionally. When the error is non-fatal, though, we were tripping over the BUG_ON on those occasions when a few skbs were already submitted. The patch here is more about NOT calling dql_reset() than it is about clearing __QUEUE_STATE_STACK_XOFF. That was only included because it-was-the-other-thing-the-function-did. So ... maybe we don't care about __QUEUE_STATE_STACK_XOFF?
Nick, do we *need* to have __QUEUE_STATE_STACK_XOFF cleared? If not, then do we need to call or emulate netdev_tx_reset_queue() at all on a non-fatal error?
Powered by blists - more mailing lists