[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <41428323-618b-4d54-899a-b2a5eafb6a03@nvidia.com>
Date: Thu, 23 Oct 2025 13:44:54 +0300
From: Shahar Shitrit <shshitrit@...dia.com>
To: Sabrina Dubroca <sd@...asysnail.net>
Cc: Tariq Toukan <tariqt@...dia.com>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Andrew Lunn <andrew+netdev@...n.ch>, "David S. Miller"
<davem@...emloft.net>, Saeed Mahameed <saeedm@...dia.com>,
Leon Romanovsky <leon@...nel.org>, Mark Bloch <mbloch@...dia.com>,
John Fastabend <john.fastabend@...il.com>, netdev@...r.kernel.org,
linux-rdma@...r.kernel.org, linux-kernel@...r.kernel.org,
Gal Pressman <gal@...dia.com>
Subject: Re: [PATCH net V2 2/3] net: tls: Cancel RX async resync request on
rdc_delta overflow
On 22/10/2025 15:47, Sabrina Dubroca wrote:
> 2025-10-22, 14:38:17 +0300, Shahar Shitrit wrote:
>>
>>
>> On 21/10/2025 18:28, Sabrina Dubroca wrote:
>>> nit if you end up respinning, there's a typo in the subject:
>>> s/rdc_delta/rcd_delta/
>>>
>>>
>>> 2025-10-20, 10:05:53 +0300, Tariq Toukan wrote:
>>>> From: Shahar Shitrit <shshitrit@...dia.com>
>>>>
>>>> When a netdev issues a RX async resync request for a TLS connection,
>>>> the TLS module handles it by logging record headers and attempting to
>>>> match them to the tcp_sn provided by the device. If a match is found,
>>>> the TLS module approves the tcp_sn for resynchronization.
>>>>
>>>> While waiting for a device response, the TLS module also increments
>>>> rcd_delta each time a new TLS record is received, tracking the distance
>>>> from the original resync request.
>>>>
>>>> However, if the device response is delayed or fails (e.g due to
>>>> unstable connection and device getting out of tracking, hardware
>>>> errors, resource exhaustion etc.), the TLS module keeps logging and
>>>> incrementing, which can lead to a WARN() when rcd_delta exceeds the
>>>> threshold.
>>>>
>>>> To address this, introduce tls_offload_rx_resync_async_request_cancel()
>>>> to explicitly cancel resync requests when a device response failure is
>>>> detected. Call this helper also as a final safeguard when rcd_delta
>>>> crosses its threshold, as reaching this point implies that earlier
>>>> cancellation did not occur.
>>>>
>>>> Fixes: 138559b9f99d ("net/tls: Fix wrong record sn in async mode of device resync")
>>>
>>> The patch itself looks good, but what issue is fixed within this
>>> patch? The helper will be useful in the next patch, but right now
>>> we're only resetting the resync_async status. The only change I see
>>> (without patch 3) is that we won't call tls_device_rx_resync_async()
>>> next time we decrypt a record in SW, but it wouldn't have done
>>> anything.
>>>
>>> Actually, also in patch 1/3, there is no "fix" is in that patch.
>>>
>>
>> I agree about patch 1/3 so I'll remove the fixes tag.
>>
>> For this patch, indeed at this point the WARN() was already fired,
>> however, the bug being addressed is the unnecessary work the TLS module
>> continues to do. For my liking, the wasted CPU cycles and resources
>> alone justify the fix, even if we've already issued a warning.
>> What do you think?
>
> Is there any work being done/avoided other than calling
> tls_device_rx_resync_async and returning immediately?
>
> With or without the patch, tls_device_rx_resync_new_rec will be called
> during stream parsing.
>
> Currently, resync_async->req doesn't get reset so we'll call
> tls_device_rx_resync_async. We're still in async phase, rcd_delta is
> still USHRT_MAX, and we're done, tls_device_rx_resync_new_rec returns.
>
> With the patch, we'll see that resync_async->req is 0 and avoid
> calling tls_device_rx_resync_async.
>
> Did I miss something else?
>
My bad, you are right. The unnecessary work the invocation of
tls_device_rx_resync_async.
OK so there are some options; I can either simply remove the fixes tag
and leave the patch as is, or I can also remove the call to
tls_offload_rx_resync_async_request_cancel() at that point so the patch
only introduces the helper (and then submit a patch to net-next that
adds the call to tls_offload_rx_resync_async_request_cancel when
rcd_delta == USHRT_MAX to improve the behavior).
what do you think it's the best to do?
Powered by blists - more mailing lists