[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <73a55db3-1298-9db7-3a81-f34f86587bfe@linux.ibm.com>
Date: Mon, 26 Jun 2023 14:23:49 -0500
From: Nick Child <nnac123@...ux.ibm.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, haren@...ux.ibm.com, ricklind@...ibm.com
Subject: Re: [PATCH net] ibmvnic: Do not reset dql stats on NON_FATAL err
On 6/26/23 12:33, Jakub Kicinski wrote:
>
> If packets can get stuck in the driver that needs a dedicated fix.
>
> You should only be doing the batching if xmit_more is set, really.
> And xmit_more will not be set if core is about to set
> __QUEUE_STATE_STACK_XOFF.
>
I am not saying that packets can get stuck in the driver, if xmit_more
is false or if the driver is hard resetting then the batch gets sent.
During nonfatal reset, we are simply reestablishing communications with
the NIC and it is okay to keep the batch around. It is possible that
there are batched skb's because xmit_more and the non fatal reset work
independently of each-other and are not mutually exclusive. The upper
level functions have no way of knowing when an issue from the NIC will
occur.
> With a correctly written driver STACK_XOFF can only be set if we're
> expecting a completion.
>
> AFAIU you shouldn't have to clear this flag. We need to reset DQL when
> hard resetting the queue because we may lose completions. But if no
> completions are lost, STACK_XOFF should be left alone.
>
> Just clearing that flag is not valid either because qdiscs will not be
> rescheduled, so until some socket tries to xmit next packet the data
> already queued will not move.
This addresses my concern. So if STACK_XOFF gets set, then xmit_more
will be false and our batch will get sent. This will eventually lead to
a completion which will clear STACK_XOFF. Meaning the reset should not
have to worry about clearing STACK_XOFF.
My worry was that STACK_XOFF would get set when the batch was only
partially full and xmit_more was true. If a non fatal reset occurred
here then there would be no expected completions and a partially filled
batch. So I thought we would have to clear STACK_XOFF in order to get
the queue to be usable again. Sounds like this is not possible due to
xmit_more being false if STACK_XOFF gets set.
On 6/26/23 14:06, Rick Lindsley wrote:
>
> Nick, do we *need* to have __QUEUE_STATE_STACK_XOFF cleared? If not,
> then do we need to call or emulate netdev_tx_reset_queue() at all on a
> non-fatal error?
After reading Jakubs review. I believe we should remove the part where
we clear STACK_XOFF.
If there are no objections, I will test these changes and send a v2.
Thanks
Powered by blists - more mailing lists