netdev - Re: [RFC] Applicability of using 'txq_trans

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <040cd578-946f-0141-c28a-2f04d00d9790@broadcom.com>
Date:   Tue, 12 Apr 2022 12:34:23 -0700
From:   Ray Jui <ray.jui@...adcom.com>
To:     Michael Chan <michael.chan@...adcom.com>
Cc:     Jakub Kicinski <kuba@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Netdev <netdev@...r.kernel.org>
Subject: Re: [RFC] Applicability of using 'txq_trans_update' during ring
 recovery

On 4/12/2022 12:19 PM, Michael Chan wrote:
> On Tue, Apr 12, 2022 at 11:36 AM Ray Jui <ray.jui@...adcom.com> wrote:
>> On 4/12/22 11:24, Michael Chan wrote:
>>> On Tue, Apr 12, 2022 at 11:08 AM Ray Jui <ray.jui@...adcom.com> wrote:
>>>
>>>> Can you please also comment on whether 'txq_trans_update' is considered
>>>> an acceptable approach in this particular scenario?
>>>
>>> In my opinion, updating trans_start to the current jiffies to prevent
>>> TX timeout is not a good solution.  It just buys you the arbitrary TX
>>> timeout period before the next TX timeout.  If you take more than this
>>> time to restart the TX queue, you will still get TX timeout.
>>
>> However, one can argue that the recovery work is expected to be finished
>> in much less time than any arbitrary TX timeout period. If the recovery
>> of the particular NAPI ring set is taking more than an arbitrary TX
>> timeout period, then something is wrong and we should really TX timeout.
> 
> Even if it should work in a specific case, you are still expanding the
> definition of TX timeout to be no shorter than this specific recovery
> time.
> 
> Our general error recovery time that includes firmware and chip reset
> can take longer than the TX timeout period.  And we call
> netif_carrier_off() for the whole duration.

Sure, that is the general error recovery case which is very different
from this specific recovery case we are discussing here. This specific
recovery is solely performed by driver (without resetting firmware and
chip) on a per NAPI ring set basis. While a specific NAPI ring set is
being recovered, traffic is still going with the rest of the NAPI ring
sets. Average recovery time is in the 1 - 2 ms range in this type of
recovery.

Also as I already said, 'netif_carrier_off' is not an option given that
the RoCE/infiniband subsystem has a dependency on 'netif_carrier_status'
for many of their operations.

Basically I'm looking for a solution that allows one to be able to:
1) quieice traffic and perform recovery on a per NAPI ring set basis
2) During recovery, it does not cause any drastic effect on RoCE

'txq_trans_update' may not be the most optimal solution, but it is a
solution that satisfies the two requirements above. If there are any
other option that is considered more optimal than 'txq_trans_update' and
can satisfy the two requirements, please let me know.

Thanks.

Download attachment "smime.p7s" of type "application/pkcs7-signature" (4194 bytes)