netdev - Re: [net-next 10/11] net/mlx5e: kTLS, Add kTLS RX resync support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <27149ee9-0483-ecff-a4ec-477c8c03d4dd@mellanox.com>
Date:   Sun, 31 May 2020 15:06:28 +0300
From:   Boris Pismenny <borisp@...lanox.com>
To:     Jakub Kicinski <kuba@...nel.org>,
        Saeed Mahameed <saeedm@...lanox.com>
Cc:     "davem@...emloft.net" <davem@...emloft.net>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Tariq Toukan <tariqt@...lanox.com>
Subject: Re: [net-next 10/11] net/mlx5e: kTLS, Add kTLS RX resync support

On 30/05/2020 0:50, Jakub Kicinski wrote:
> On Fri, 29 May 2020 20:44:29 +0000 Saeed Mahameed wrote:
>>> I thought you said that resync requests are guaranteed to never fail?
>>
>> I didn't say that :),  maybe tariq did say this before my review,
> 
> Boris ;)
> 

I didn't say we are perfect, just that we can make a trade-off here, and currently this is the simplest version that our team came up with for this series.
As a first step, I think it is reasonable. But, I expect that we will improve it in the future.

>> but basically with the current mlx5 arch, it is impossible to guarantee
>> this unless we open 1 service queue per ktls offloads and that is going
>> to be an overkill!

I disagree, there are many ways to guarantee reliability here. For example, we can sleep/spin until there is space in the queue or rely on work stealing to let a later operation execute this one.

> 
> IIUC every ooo packet causes a resync request in your implementation -
> is that true?
> 

No, only header loss. We never required a resync per OOO packet. I'm not sure why would you think that.

> It'd be great to have more information about the operation of the
> device in the commit message..
> 

I'll try to clarify the resync flow here.
As always, the packet that requires resync is marked as such in the CQE.
However, unlike previous devices, the TCP sequence (tcpsn) where the HW found a header is not provided in the CQE.
Instead, tcpsn is queried from HW asynchronously by the driver.
We employ the force resync approach to guarantee that we can log all resync locations between the received packet and the HW query response.
We check the asynchronous HW query response against all resync values between the packet that triggered the resync and now. If one of them matches, then resync can be completed immediately.
Otherwise, the driver keeps waiting for the correct resync.

>> This is a rare corner case anyway, where more than 1k tcp connections
>> sharing the same RX ring will request resync at the same exact moment. 
> 
> IDK about that. Certain applications are architected for max capacity,
> not efficiency under steady load. So it matters a lot how the system
> behaves under stress. What if this is the chain of events:
> 
> overload -> drops -> TLS steams go out of sync -> all try to resync

I agree that this is not that rare, and it may be improved both in future patches and hardware.
Do you think it is critical to improve it now, and not in a follow-up series?

> 
> We don't want to add extra load on every record if HW offload is
> enabled. That's why the next record hint backs off, checks socket 
> state etc.
> 
> BTW I also don't understand why mlx5e_ktls_rx_resync() has a
> tls_offload_rx_force_resync_request(sk) at the end. If the update 
> from the NIC comes with a later seq than current, request the sync 
> for _that_ seq. I don't understand the need to force a call back on
> every record here. 
> 

The extra load here is minimal, i.e. a single call from TLS to the driver, which usually just logs the information.

> Also if the sync failed because queue was full, I don't see how forcing 
> another sync attempt for the next record is going to match?
> 

It doesn't, and if sync failed then we should stop trying to force resync.