[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200528102953.4bfc424f@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net>
Date: Thu, 28 May 2020 10:29:53 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Boris Pismenny <borisp@...lanox.com>
Cc: Tariq Toukan <tariqt@...lanox.com>,
"David S. Miller" <davem@...emloft.net>, netdev@...r.kernel.org,
Saeed Mahameed <saeedm@...lanox.com>
Subject: Re: [PATCH net] net/tls: Fix driver request resync
On Thu, 28 May 2020 09:03:07 +0300 Boris Pismenny wrote:
> On 20/05/2020 23:34, Jakub Kicinski wrote:
> > On Wed, 20 May 2020 18:14:08 +0300 Tariq Toukan wrote:
> >> From: Boris Pismenny <borisp@...lanox.com>
> >>
> >> In driver request resync, the hardware requests a resynchronization
> >> request at some TCP sequence number. If that TCP sequence number does
> >> not point to a TLS record header, then the resync attempt has failed.
> >>
> >> Failed resync should reset the resync request to avoid spurious resyncs
> >> after the TCP sequence number has wrapped around.
> >>
> >> Fix this by resetting the resync request when the TLS record header
> >> sequence number is not before the requested sequence number.
> >> As a result, drivers may be called with a sequence number that is not
> >> equal to the requested sequence number.
> >>
> >> Fixes: f953d33ba122 ("net/tls: add kernel-driven TLS RX resync")
> >> Signed-off-by: Boris Pismenny <borisp@...lanox.com>
> >> Signed-off-by: Tariq Toukan <tariqt@...lanox.com>
> >> Reviewed-by: Saeed Mahameed <saeedm@...lanox.com>
> >> ---
> >> net/tls/tls_device.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> >> index a562ebaaa33c..cbb13001b4a9 100644
> >> --- a/net/tls/tls_device.c
> >> +++ b/net/tls/tls_device.c
> >> @@ -714,7 +714,7 @@ void tls_device_rx_resync_new_rec(struct sock *sk, u32 rcd_len, u32 seq)
> >> seq += TLS_HEADER_SIZE - 1;
> >> is_req_pending = resync_req;
> >>
> >> - if (likely(!is_req_pending) || req_seq != seq ||
> >> + if (likely(!is_req_pending) || before(seq, req_seq) ||
> > So the kernel is going to send the sync message to the device with at
> > sequence number the device never asked about?
>
> Yes, although I would phrase it differently: the kernel would indicate to the driver,
> that the resync request is wrong, and that it can go back to searching for a header.
> If there are any drivers that need an extra check, then we can add it in the driver itself.
I'd rather make the API clear and use a different op to indicate this is
a reset rather than a valid sync response. sync callback already has
the enum for sync type.
> > Kernel usually can't guarantee that the notification will happen,
> > (memory allocation errors, etc.) so the device needs to do the
> > restarting itself. The notification should not be necessary.
> >
>
> Usually, all is best effort, but in principle, reliability should be guaranteed by higher layers to simplify the design.
Since we're talking high level design perspectives here - IMO when you
talk to FW on a device - it's a distributed system. The days when you
could say driver on the host is higher layer ended when people started
putting fat firmwares on the NICs. So no, restart has to be handled by
the system making the request. In this case the NIC.
> On the one hand, resync depends on packet arrival, which may take a while, and implementing different heuristics in each driver to timeout is complex.
> On the other hand, assuming the user reads the record data eventually, ktls will be able to deliver the resync request, so implementing this in the tls layer is simple.
We definitely not want any driver logic here - the resync restart logic
has to be implemented on the device.
> In this case, I see no reason for the tls layer to fail --- did you have a specific flow in mind?
> AFAICT, there are no memory allocation/error flows that will prevent the driver to receive a resync without an error on the socket (bad tls header).
> If tls received the header, then the driver will receive the resync call, and it will take responsibility for reliably delivering it to HW.
So you're saying the request path and the response path for resync are
both 100% lossless both on the NIC and the host? There is no scenario
in which queue overflows, PCIe gets congested, etc.?
Powered by blists - more mailing lists