lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <73a55db3-1298-9db7-3a81-f34f86587bfe@linux.ibm.com>
Date: Mon, 26 Jun 2023 14:23:49 -0500
From: Nick Child <nnac123@...ux.ibm.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, haren@...ux.ibm.com, ricklind@...ibm.com
Subject: Re: [PATCH net] ibmvnic: Do not reset dql stats on NON_FATAL err



On 6/26/23 12:33, Jakub Kicinski wrote:
> 
> If packets can get stuck in the driver that needs a dedicated fix.
> 
> You should only be doing the batching if xmit_more is set, really.
> And xmit_more will not be set if core is about to set
> __QUEUE_STATE_STACK_XOFF.
> 

I am not saying that packets can get stuck in the driver, if xmit_more 
is false or if the driver is hard resetting then the batch gets sent.

During nonfatal reset, we are simply reestablishing communications with 
the NIC and it is okay to keep the batch around. It is possible that 
there are batched skb's because xmit_more and the non fatal reset work 
independently of each-other and are not mutually exclusive. The upper 
level functions have no way of knowing when an issue from the NIC will 
occur.

> With a correctly written driver STACK_XOFF can only be set if we're
> expecting a completion.
> 
> AFAIU you shouldn't have to clear this flag. We need to reset DQL when
> hard resetting the queue because we may lose completions. But if no
> completions are lost, STACK_XOFF should be left alone.
> 
> Just clearing that flag is not valid either because qdiscs will not be
> rescheduled, so until some socket tries to xmit next packet the data
> already queued will not move.

This addresses my concern. So if STACK_XOFF gets set, then xmit_more 
will be false and our batch will get sent. This will eventually lead to 
a completion which will clear STACK_XOFF. Meaning the reset should not 
have to worry about clearing STACK_XOFF.

My worry was that STACK_XOFF would get set when the batch was only 
partially full and xmit_more was true. If a non fatal reset occurred 
here then there would be no expected completions and a partially filled 
batch. So I thought we would have to clear STACK_XOFF in order to get 
the queue to be usable again. Sounds like this is not possible due to 
xmit_more being false if STACK_XOFF gets set.

On 6/26/23 14:06, Rick Lindsley wrote:
 >
 > Nick, do we *need* to have __QUEUE_STATE_STACK_XOFF cleared?  If not,
 > then do we need to call or emulate netdev_tx_reset_queue() at all on a
 > non-fatal error?

After reading Jakubs review. I believe we should remove the part where 
we clear STACK_XOFF.

If there are no objections, I will test these changes and send a v2.

Thanks

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ