lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 12 Apr 2022 11:08:02 -0700
From:   Ray Jui <ray.jui@...adcom.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     "David S. Miller" <davem@...emloft.net>, netdev@...r.kernel.org
Subject: Re: [RFC] Applicability of using 'txq_trans_update' during ring
 recovery

Hi Jakub,

On 4/12/2022 10:37 AM, Jakub Kicinski wrote:
> On Tue, 12 Apr 2022 10:01:02 -0700 Ray Jui wrote:
>> Hi David/Jakub,
>>
>> I'd like to run through you on the idea of invoking 'txq_trans_update'
>> to update the last TX timestamp in the scenario where we temporarily
>> stop the TX queue to do some recovery work. Is it considered an
>> acceptable approach to prevent false positive triggering of TX timeout
>> during the recovery process?
>>
>> I know in general people use 'netif_carrier_off' during the process when
>> they reset/change the entire TX/RX ring set and/or other resources on
>> the Ethernet card. But in our particular case, we have another driver
>> running (i.e., RoCE) on top and setting 'netif_carrier_off' will cause a
>> significant side effect on the other driver (e.g., all RoCE QPs will be
>> terminated). In addition, for this special recovery work on our driver,
>> we are doing it on a per NAPI ring set basis while keeping the traffic
>> on other queues running. Using 'netif_carrier_off' will prevent traffic
>> running from all other queues that are not going through recovery.
> 
> Can you use netif_device_detach() to mark the device as not present?

It seems 'netif_device_detach' marks the netif device as removed
(through __LINK_STATE_PRESENT) and stops all TX queues.

It also seems the core infiniband subsystem mainly relies on
'netif_carrier_ok' and 'netif_runing', so 'netif_device_detach' might
potentially work. I also need to check with our internal RoCE driver
team to confirm.

One drawback with 'netif_device_detach' compared to the current solution
is that we will have to stop all TX queues during the entire duration of
the recovery process (instead of on a per NAPI ring set basis).

Can you please also comment on whether 'txq_trans_update' is considered
an acceptable approach in this particular scenario? And if not, is there
another mechanism in the kernel net subsystem that allows one to quiece
traffic on a per NAPI ring set basis?

Thanks,

Ray

Download attachment "smime.p7s" of type "application/pkcs7-signature" (4194 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ