netdev - Re: [net-next 10/11] net/mlx5e: kTLS, Add kTLS RX resync support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cd4b35f3a998d4f3b98b0f7681b90fe2a99a311f.camel@mellanox.com>
Date:   Sat, 30 May 2020 04:07:18 +0000
From:   Saeed Mahameed <saeedm@...lanox.com>
To:     Boris Pismenny <borisp@...lanox.com>,
        "kuba@...nel.org" <kuba@...nel.org>
CC:     "davem@...emloft.net" <davem@...emloft.net>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Tariq Toukan <tariqt@...lanox.com>
Subject: Re: [net-next 10/11] net/mlx5e: kTLS, Add kTLS RX resync support

On Fri, 2020-05-29 at 22:47 +0000, Saeed Mahameed wrote:
> On Fri, 2020-05-29 at 14:50 -0700, Jakub Kicinski wrote:
> > On Fri, 29 May 2020 20:44:29 +0000 Saeed Mahameed wrote:
> > > > I thought you said that resync requests are guaranteed to never
> > > > fail?
> > > 
> > > I didn't say that :),  maybe tariq did say this before my review,
> > 
> > Boris ;)
> > 
> > > but basically with the current mlx5 arch, it is impossible to
> > > guarantee
> > > this unless we open 1 service queue per ktls offloads and that is
> > > going
> > > to be an overkill!
> > 
> > IIUC every ooo packet causes a resync request in your
> > implementation
> > -
> > is that true?
> > 
> 
> For tx yes, for RX i am not sure, this is a hw flow that I am not
> fully
> familiar with.
> 
> Anyway according to Tariq, The hw might generate more than one resync
> request on the same flow, and this is all being handled by the driver
> correctly. I am not sure if this is what you are looking for.
> 
> Maybe Tariq/Boris can elaborate more on the hw resync mechanism.
> 
> > It'd be great to have more information about the operation of the
> > device in the commit message..
> > 
> 
> How about:
> 
> Resync flow occurs when packets have been lost and the device lost
> track of TLS records. The device attempts to resync by tracking TLS
> records, and sends a resync request to driver. The TLS Progress
> Params
> Context holds the TCP-SN of the record where the device began
> tracking
> records and counting them. The driver will acknowledge the TCP-SN if
> it
> matches a legal record by setting the TLS Static Params Context.
> 
> ? 
> we can elaborate more with a step by step procedure.. if you think it
> is required.
> 
> > > This is a rare corner case anyway, where more than 1k tcp
> > > connections
> > > sharing the same RX ring will request resync at the same exact
> > > moment. 
> > 
> > IDK about that. Certain applications are architected for max
> > capacity,
> > not efficiency under steady load. So it matters a lot how the
> > system
> > behaves under stress. What if this is the chain of events:
> > 
> > overload -> drops -> TLS steams go out of sync -> all try to resync
> > 
> > We don't want to add extra load on every record if HW offload is
> > enabled. That's why the next record hint backs off, checks socket 
> > state etc.
> > 

What we can do here is instead of failing when the queue is full, a
resync request will keep trying and exponentially backoff 
up to once per second. so eventually the system will not overload if
the hw queue can't keep up, and eventually the latest hw resync request
will be handled.

> > BTW I also don't understand why mlx5e_ktls_rx_resync() has a
> > tls_offload_rx_force_resync_request(sk) at the end. If the update 
> > from the NIC comes with a later seq than current, request the sync 
> > for _that_ seq. I don't understand the need to force a call back on
> > every record here. 
> 
> Good point theoretically should work, unless we have some limitations
> that i am not seeing, i will let Tariq comment on this.
> 

I think same as above, we can hint to the hw _that_ new seq, 
and will backoff until the hw catches up with sw and issues a new valid
resync request.

> > Also if the sync failed because queue was full, I don't see how
> > forcing 
> > another sync attempt for the next record is going to match?
> 
> In this case i guess we need to abort and wait for the hw to issue
> anew resync request ..