netdev - Re: [PATCH net 3/3] Documentation: add TLS offload documentation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f9e09ac2-ea5e-5b76-8aaf-ae50b21d3162@mellanox.com>
Date:   Sun, 19 May 2019 06:24:36 +0000
From:   Boris Pismenny <borisp@...lanox.com>
To:     Jakub Kicinski <jakub.kicinski@...ronome.com>
CC:     "davem@...emloft.net" <davem@...emloft.net>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "oss-drivers@...ronome.com" <oss-drivers@...ronome.com>,
        "alexei.starovoitov@...il.com" <alexei.starovoitov@...il.com>,
        "davejwatson@...com" <davejwatson@...com>,
        "john.fastabend@...il.com" <john.fastabend@...il.com>,
        "vakul.garg@....com" <vakul.garg@....com>,
        Alexei Starovoitov <ast@...nel.org>,
        Tariq Toukan <tariqt@...lanox.com>
Subject: Re: [PATCH net 3/3] Documentation: add TLS offload documentation



On 5/16/2019 8:56 PM, Jakub Kicinski wrote:
> On Thu, 16 May 2019 09:08:52 +0000, Boris Pismenny wrote:
>>> diff --git a/Documentation/networking/tls-offload.rst b/Documentation/networking/tls-offload.rst
>>> new file mode 100644
>>> index 000000000000..32fecb3fbc4c
>>> --- /dev/null
>>> +++ b/Documentation/networking/tls-offload.rst
>>> @@ -0,0 +1,438 @@

>>> +RX
>>> +--
>>> +
>>> +Before a packet is DMAed to the host (but after NIC's embedded switching
>>> +and packet transformation functions) the device performs a 5-tuple lookup
>>> +to find any TLS connection the packet may belong to (technically a 4-tuple
>>> +lookup is sufficient - IP addresses and TCP port numbers, as the protocol
>>> +is always TCP). If connection is matched device confirms if the TCP sequence
>>> +number is the expected one and proceeds to TLS handling (record delineation,
>>> +decryption, authentication for each record in the packet).
>>> +
>>> +If decryption or authentication fails for any record in the packet, the packet
>>> +must be passed to the host as it was received on the wire. This means packets
>>
>> This is not normal mode of operation, but rather an error handling
>> description. Please try to describe only the good flow here, and leave
>> the errors for a separate section.
> 
> Normal as device is in sync with the stream vs the Resync handling
> section.  It is not clear from the name, you're right, I will try
> to split further and see how it turns out.
> 

But it is not normal, as decryption or authentication failure are not 
normal. Such a packet is bound to terminate the TLS connection and 
forcing hardware to re-encrypt it is too strict IMO. Instead, I think 
that as long as the driver can provide the stack with the original 
packet in this case, then it is good enough.

>>> +should not be modified "in place". Splitting segments to handle partial
>>> +decryption is not advised. In other words either all records in the packet
>>> +had been handled successfully and authenticated or the packet has to be passed
>>> +to the host as it was on the wire. The device communicates whether the packet
>>> +was successfully decrypted in the per-packet context (descriptor) passed
>>> +to the host.
>>> +
>>> +The device leaves the record framing unmodified, the stack takes care of
>>> +record decapsulation.
>>> +
>>> +Upon reception of a TLS offloaded packet, the driver sets
>>> +the :c:member:`decrypted` mark in :c:type:`struct sk_buff <sk_buff>`
>>> +corresponding to the segment. Networking stack makes sure decrypted
>>> +and non-decrypted segments do not get coalesced and takes care of partial
>>> +decryption.
>>
>> Please mention checksum handling as well. It would not make any sense to
>> use CHECKSUM_COMPLETE here. Instead, CHECKSUM_UNNECESSARY should be
>> expected.
> 
> I was on the fence about adding the checksum info.  I had the feeling
> that even for CHECKSUM_UNNECESSARY it's fairly strange to pass mangled
> packets.  Looking at skbuff.h the checksum doc states:
> 
>   * CHECKSUM_UNNECESSARY:
>   *
>   *   The hardware you're dealing with doesn't calculate the full checksum
>   *   (as in CHECKSUM_COMPLETE), but it does parse headers and verify checksums
>   *   for specific protocols. For such packets it will set CHECKSUM_UNNECESSARY
>   *   if their checksums are okay. skb->csum is still undefined in this case
>   *   though. A driver or device must never modify the checksum field in the
>   *   packet even if checksum is verified.
> 
> My reading of the last sentence is: the checksum in the packet must
> still be correct (based on the context in which this comment was
> written, which was in the days of UDP tunnel offload work).
> IOW UNNECESSARY doesn't mean "don't look at the checksum field",
> it means "I've looked at the checksum field and it's correct".
> 

This interpretation is far from the text - "A driver or device must 
never modify the checksum field". If it is correct, then it needs to be 
fixed. Moreover, one could consider the checksum field to be correct, 
because the encryption can be reverted by the socket/driver on demand.

AFAIU, CHECKSUM_UNNECESSARY is exactly for cases like this, where the 
hardware might have mangled the known payload/headers. But it verified 
the checksum before doing so. As a result, checksum fields might be 
wrong (but unmodified), and setting CHECKSUM_UNNECESSARY informs the 
network stack that it can trust this packet.

> IMHO CHECKSUM_UNNECESSARY without correcting the TCP header csum field
> is only slightly less broken than CHECKSUM_COMPLETE with pre-decrypt
> csum and without fixing the TCP header.
> 
> Not to mention the fact that users may disable RXCSUM offload.

It does not matter if the user disables RXCSUM, because the HW *must* do 
checksum validation for TLS Rx offload regardless of this setting.

> 
> Maybe the least broken option is to fix the TCP header csum and pass
> CHECKSUM_COMPLETE of the encrypted data?  But then again clearly the HW
> has parsed the packet (voiding the non-ossification gain), and we won't
> be doing tunnelling on clear text..

I don't see the point of doing this. TLS Rx offload is for L4 
*endpoints*, not for L3 routers. As such, the socket will receive the 
packet and process it as part of a TCP stream, thereafter the data can 
be forwarded.
The socket must perform the TCP checksum verification, and 
CHECKSUM_UNNECESSARY makes all of this work without complex changes.

> 
> So CHECKSUM_UNNECESSARY "would do".
> 
> This is a long winded way of saying - I didn't see the perfect solution
> here, so I thought it's better not to codify it in this doc.  But
> perhaps I can phrase it tentatively enough.  How about:
> 
> 
>    The preferred method of reporting the Layer 4 (TCP) checksum offload
>    for packets decrypted by the device is to update the checksum field
>    to the correct value for clear text and report CHECKSUM_UNNECESSARY
>    or CHECKSUM_COMPLETE computed over clear text. However, the exact
>    semantics of RX checksum offload when NIC performs data modification
>    are not clear and subject to change.
> 

I disagree with this. Modifying the original checksum field erases the 
original checksum, which might not be recoverable later.