netdev - Re: [net-next 10/11] net/mlx5e: kTLS, Add kTLS RX resync support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3895f115-6a0b-29ff-83b9-7e099819a570@mellanox.com>
Date:   Wed, 3 Jun 2020 10:02:33 +0300
From:   Tariq Toukan <tariqt@...lanox.com>
To:     Jakub Kicinski <kuba@...nel.org>,
        Tariq Toukan <tariqt@...lanox.com>
Cc:     Boris Pismenny <borisp@...lanox.com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [net-next 10/11] net/mlx5e: kTLS, Add kTLS RX resync support



On 6/2/2020 9:31 PM, Jakub Kicinski wrote:
> On Tue, 2 Jun 2020 14:32:44 +0300 Tariq Toukan wrote:
>> On 6/2/2020 1:12 AM, Jakub Kicinski wrote:
>>>>>> This is a rare corner case anyway, where more than 1k tcp
>>>>>> connections sharing the same RX ring will request resync at the
>>>>>> same exact moment.
>>>>>
>>>>> IDK about that. Certain applications are architected for max
>>>>> capacity, not efficiency under steady load. So it matters a lot how
>>>>> the system behaves under stress. What if this is the chain of
>>>>> events:
>>>>>
>>>>> overload -> drops -> TLS steams go out of sync -> all try to resync
>>>>>      
>>>>
>>>> I agree that this is not that rare, and it may be improved both in
>>>> future patches and hardware. Do you think it is critical to improve
>>>> it now, and not in a follow-up series?
>>>
>>> It's not a blocker for me, although if this makes it into 5.8 there
>>> will not be a chance to improve before net-next closes, so depends if
>>> you want to risk it and support the code as is...
>>>    
>>
>> Hi Jakub,
>> Thanks for your comments.
>>
>> This is just the beginning of this driver's offload support. I will
>> continue working on enhancements and improvements in next kernels.
>> We have several enhancements in plans.
>>
>> For now, if no real blockers, I think it's in a good shape to start with
>> and make it to the kernel.
>>
>> IMHO, this specific issue of better handling the resync failure in
>> driver can be addressed in stages:
>>
>> 1. As a fix: stop asking the stack for resync re-calls. If a resync
>> attempt fails, terminate any resync attempts for the specific connection.
>> If there's room for a re-spin I can provide today. Otherwise it is a
>> simple fix that can be addressed in the early rc's in -net.
>> What do you think?
>>
>> 2. Recover: this is an enhancement to be done in future kernels, where
>> the driver internally and independently recovers from failed attempts
>> and makes sure the are processed when there's enough room on the SQ
>> again. Without the stack being engaged.
> 
> IIUC the HW asks for a resync at the first record after a specific seq
> (the record header is in the frame that carried the OOO marking, right?)
> 
> Can we make the core understand those semantics and avoid trying to
> resync at the wrong record?
> 

HW asks for a resync when it is in tracking mode and identifies the 
magic, so it calculates the expected seq of next record.
This seq is not part of the completion (for now, this is a planned 
enhancement), so the device driver posts a request to the device to get 
the seq, and then the driver hopefully approve it (by another post to 
the HW) after comparing it to the stack sw seq.

As long as the device driver does not know the HW expected seq, it 
cannot provide a seq to the stack. So force resync is used.

We can think of an optimization here, it is doable.