[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200601151206.454168ad@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net>
Date: Mon, 1 Jun 2020 15:12:06 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Boris Pismenny <borisp@...lanox.com>
Cc: Saeed Mahameed <saeedm@...lanox.com>,
"davem@...emloft.net" <davem@...emloft.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Tariq Toukan <tariqt@...lanox.com>
Subject: Re: [net-next 10/11] net/mlx5e: kTLS, Add kTLS RX resync support
On Sun, 31 May 2020 15:06:28 +0300 Boris Pismenny wrote:
> On 30/05/2020 0:50, Jakub Kicinski wrote:
> > On Fri, 29 May 2020 20:44:29 +0000 Saeed Mahameed wrote:
> >>> I thought you said that resync requests are guaranteed to never fail?
> >>
> >> I didn't say that :), maybe tariq did say this before my review,
> >
> > Boris ;)
> >
>
> I didn't say we are perfect, just that we can make a trade-off here,
> and currently this is the simplest version that our team came up with
> for this series. As a first step, I think it is reasonable. But, I
> expect that we will improve it in the future.
>
> >> but basically with the current mlx5 arch, it is impossible to
> >> guarantee this unless we open 1 service queue per ktls offloads
> >> and that is going to be an overkill!
>
> I disagree, there are many ways to guarantee reliability here. For
> example, we can sleep/spin until there is space in the queue or rely
> on work stealing to let a later operation execute this one.
>
> > IIUC every ooo packet causes a resync request in your
> > implementation - is that true?
> >
>
> No, only header loss. We never required a resync per OOO packet. I'm
> not sure why would you think that.
I mean until device is back in sync every frame kicks off
resync_update_sn() and tries to queue the work, right?
> > It'd be great to have more information about the operation of the
> > device in the commit message..
> >
>
> I'll try to clarify the resync flow here.
> As always, the packet that requires resync is marked as such in the
> CQE. However, unlike previous devices, the TCP sequence (tcpsn) where
> the HW found a header is not provided in the CQE. Instead, tcpsn is
> queried from HW asynchronously by the driver. We employ the force
> resync approach to guarantee that we can log all resync locations
> between the received packet and the HW query response. We check the
> asynchronous HW query response against all resync values between the
> packet that triggered the resync and now. If one of them matches,
> then resync can be completed immediately. Otherwise, the driver keeps
> waiting for the correct resync.
Thanks, makes sense.
> >> This is a rare corner case anyway, where more than 1k tcp
> >> connections sharing the same RX ring will request resync at the
> >> same exact moment.
> >
> > IDK about that. Certain applications are architected for max
> > capacity, not efficiency under steady load. So it matters a lot how
> > the system behaves under stress. What if this is the chain of
> > events:
> >
> > overload -> drops -> TLS steams go out of sync -> all try to resync
> >
>
> I agree that this is not that rare, and it may be improved both in
> future patches and hardware. Do you think it is critical to improve
> it now, and not in a follow-up series?
It's not a blocker for me, although if this makes it into 5.8 there
will not be a chance to improve before net-next closes, so depends if
you want to risk it and support the code as is...
> > We don't want to add extra load on every record if HW offload is
> > enabled. That's why the next record hint backs off, checks socket
> > state etc.
> >
> > BTW I also don't understand why mlx5e_ktls_rx_resync() has a
> > tls_offload_rx_force_resync_request(sk) at the end. If the update
> > from the NIC comes with a later seq than current, request the sync
> > for _that_ seq. I don't understand the need to force a call back on
> > every record here.
> >
>
> The extra load here is minimal, i.e. a single call from TLS to the
> driver, which usually just logs the information.
Oh yes, this one is not about extra load, I just don't know what that
code is trying to achieve.
> > Also if the sync failed because queue was full, I don't see how
> > forcing another sync attempt for the next record is going to match?
> >
>
> It doesn't, and if sync failed then we should stop trying to force
> resync.
Powered by blists - more mailing lists