netdev - Re: [net-next 10/11] net/mlx5e: kTLS, Add kTLS RX resync support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200601151206.454168ad@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net>
Date:   Mon, 1 Jun 2020 15:12:06 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     Boris Pismenny <borisp@...lanox.com>
Cc:     Saeed Mahameed <saeedm@...lanox.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Tariq Toukan <tariqt@...lanox.com>
Subject: Re: [net-next 10/11] net/mlx5e: kTLS, Add kTLS RX resync support

On Sun, 31 May 2020 15:06:28 +0300 Boris Pismenny wrote:
> On 30/05/2020 0:50, Jakub Kicinski wrote:
> > On Fri, 29 May 2020 20:44:29 +0000 Saeed Mahameed wrote:  
> >>> I thought you said that resync requests are guaranteed to never fail?  
> >>
> >> I didn't say that :),  maybe tariq did say this before my review,  
> > 
> > Boris ;)
> >   
> 
> I didn't say we are perfect, just that we can make a trade-off here,
> and currently this is the simplest version that our team came up with
> for this series. As a first step, I think it is reasonable. But, I
> expect that we will improve it in the future.
> 
> >> but basically with the current mlx5 arch, it is impossible to
> >> guarantee this unless we open 1 service queue per ktls offloads
> >> and that is going to be an overkill!  
> 
> I disagree, there are many ways to guarantee reliability here. For
> example, we can sleep/spin until there is space in the queue or rely
> on work stealing to let a later operation execute this one.
> 
> > IIUC every ooo packet causes a resync request in your
> > implementation - is that true?
> >   
> 
> No, only header loss. We never required a resync per OOO packet. I'm
> not sure why would you think that.

I mean until device is back in sync every frame kicks off
resync_update_sn() and tries to queue the work, right?

> > It'd be great to have more information about the operation of the
> > device in the commit message..
> >   
> 
> I'll try to clarify the resync flow here.
> As always, the packet that requires resync is marked as such in the
> CQE. However, unlike previous devices, the TCP sequence (tcpsn) where
> the HW found a header is not provided in the CQE. Instead, tcpsn is
> queried from HW asynchronously by the driver. We employ the force
> resync approach to guarantee that we can log all resync locations
> between the received packet and the HW query response. We check the
> asynchronous HW query response against all resync values between the
> packet that triggered the resync and now. If one of them matches,
> then resync can be completed immediately. Otherwise, the driver keeps
> waiting for the correct resync.

Thanks, makes sense.

> >> This is a rare corner case anyway, where more than 1k tcp
> >> connections sharing the same RX ring will request resync at the
> >> same exact moment.   
> > 
> > IDK about that. Certain applications are architected for max
> > capacity, not efficiency under steady load. So it matters a lot how
> > the system behaves under stress. What if this is the chain of
> > events:
> > 
> > overload -> drops -> TLS steams go out of sync -> all try to resync
> >  
> 
> I agree that this is not that rare, and it may be improved both in
> future patches and hardware. Do you think it is critical to improve
> it now, and not in a follow-up series?

It's not a blocker for me, although if this makes it into 5.8 there
will not be a chance to improve before net-next closes, so depends if
you want to risk it and support the code as is...

> > We don't want to add extra load on every record if HW offload is
> > enabled. That's why the next record hint backs off, checks socket 
> > state etc.
> > 
> > BTW I also don't understand why mlx5e_ktls_rx_resync() has a
> > tls_offload_rx_force_resync_request(sk) at the end. If the update 
> > from the NIC comes with a later seq than current, request the sync 
> > for _that_ seq. I don't understand the need to force a call back on
> > every record here. 
> >   
> 
> The extra load here is minimal, i.e. a single call from TLS to the
> driver, which usually just logs the information.

Oh yes, this one is not about extra load, I just don't know what that
code is trying to achieve.

> > Also if the sync failed because queue was full, I don't see how
> > forcing another sync attempt for the next record is going to match?
> >   
> 
> It doesn't, and if sync failed then we should stop trying to force
> resync.