lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 24 Aug 2022 12:52:12 -0700
From:   Matt Joras <matt.joras@...il.com>
To:     Xin Long <lucien.xin@...il.com>
Cc:     Adel Abouchaev <adel.abushaev@...il.com>,
        Jakub Kicinski <kuba@...nel.org>, davem <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Paolo Abeni <pabeni@...hat.com>,
        Jonathan Corbet <corbet@....net>,
        David Ahern <dsahern@...nel.org>, shuah@...nel.org,
        imagedong@...cent.com, network dev <netdev@...r.kernel.org>,
        linux-doc@...r.kernel.org, linux-kselftest@...r.kernel.org
Subject: Re: [net-next v2 0/6] net: support QUIC crypto


> On Aug 24, 2022, at 11:29 AM, Xin Long <lucien.xin@...il.com> wrote:
> 
> On Wed, Aug 17, 2022 at 4:11 PM Adel Abouchaev <adel.abushaev@...il.com> wrote:
>> 
>> QUIC requires end to end encryption of the data. The application usually
>> prepares the data in clear text, encrypts and calls send() which implies
>> multiple copies of the data before the packets hit the networking stack.
>> Similar to kTLS, QUIC kernel offload of cryptography reduces the memory
>> pressure by reducing the number of copies.
>> 
>> The scope of kernel support is limited to the symmetric cryptography,
>> leaving the handshake to the user space library. For QUIC in particular,
>> the application packets that require symmetric cryptography are the 1RTT
>> packets with short headers. Kernel will encrypt the application packets
>> on transmission and decrypt on receive. This series implements Tx only,
>> because in QUIC server applications Tx outweighs Rx by orders of
>> magnitude.
>> 
>> Supporting the combination of QUIC and GSO requires the application to
>> correctly place the data and the kernel to correctly slice it. The
>> encryption process appends an arbitrary number of bytes (tag) to the end
>> of the message to authenticate it. The GSO value should include this
>> overhead, the offload would then subtract the tag size to parse the
>> input on Tx before chunking and encrypting it.
>> 
>> With the kernel cryptography, the buffer copy operation is conjoined
>> with the encryption operation. The memory bandwidth is reduced by 5-8%.
>> When devices supporting QUIC encryption in hardware come to the market,
>> we will be able to free further 7% of CPU utilization which is used
>> today for crypto operations.
>> 
>> Adel Abouchaev (6):
>>  Documentation on QUIC kernel Tx crypto.
>>  Define QUIC specific constants, control and data plane structures
>>  Add UDP ULP operations, initialization and handling prototype
>>    functions.
>>  Implement QUIC offload functions
>>  Add flow counters and Tx processing error counter
>>  Add self tests for ULP operations, flow setup and crypto tests
>> 
>> Documentation/networking/index.rst     |    1 +
>> Documentation/networking/quic.rst      |  185 ++++
>> include/net/inet_sock.h                |    2 +
>> include/net/netns/mib.h                |    3 +
>> include/net/quic.h                     |   63 ++
>> include/net/snmp.h                     |    6 +
>> include/net/udp.h                      |   33 +
>> include/uapi/linux/quic.h              |   60 +
>> include/uapi/linux/snmp.h              |    9 +
>> include/uapi/linux/udp.h               |    4 +
>> net/Kconfig                            |    1 +
>> net/Makefile                           |    1 +
>> net/ipv4/Makefile                      |    3 +-
>> net/ipv4/udp.c                         |   15 +
>> net/ipv4/udp_ulp.c                     |  192 ++++
>> net/quic/Kconfig                       |   16 +
>> net/quic/Makefile                      |    8 +
>> net/quic/quic_main.c                   | 1417 ++++++++++++++++++++++++
>> net/quic/quic_proc.c                   |   45 +
>> tools/testing/selftests/net/.gitignore |    4 +-
>> tools/testing/selftests/net/Makefile   |    3 +-
>> tools/testing/selftests/net/quic.c     | 1153 +++++++++++++++++++
>> tools/testing/selftests/net/quic.sh    |   46 +
>> 23 files changed, 3267 insertions(+), 3 deletions(-)
>> create mode 100644 Documentation/networking/quic.rst
>> create mode 100644 include/net/quic.h
>> create mode 100644 include/uapi/linux/quic.h
>> create mode 100644 net/ipv4/udp_ulp.c
>> create mode 100644 net/quic/Kconfig
>> create mode 100644 net/quic/Makefile
>> create mode 100644 net/quic/quic_main.c
>> create mode 100644 net/quic/quic_proc.c
>> create mode 100644 tools/testing/selftests/net/quic.c
>> create mode 100755 tools/testing/selftests/net/quic.sh
>> 
>> 
>> base-commit: fd78d07c7c35de260eb89f1be4a1e7487b8092ad
>> --
>> 2.30.2
>> 
> Hi, Adel,
> 
> I don't see how the key update(rfc9001#section-6) is handled on the TX
> path, which is not using TLS Key update, and "Key Phase" indicates
> which key will be used after rekeying. Also, I think it is almost
> impossible to handle the peer rekeying on the RX path either based on
> your current model in the future.
Key updates are not something that needs to be handled by the kernel in this
model. I.e. a key update will be processed as normal by the userspace QUIC code and
the sockets will have to be re-associated with the new keying material.
> 
> The patch seems to get the crypto_ctx by doing a connection hash table
> lookup in the sendmsg(), which is not good from the performance side.
> One QUIC connection can go over multiple UDP sockets, but I don't
> think one socket can be used by multiple QUIC connections. So why not
> save the ctx in the socket instead?
There’s nothing preventing a single socket or UDP/IP tuple from being used
by multiple QUIC connections. This is achievable due to both endpoints having
CIDs. Note that it is not uncommon for QUIC deployments to use a single socket for
all connections, rather than the TCP listen/accept model. That being said, it
would be nice to be able to avoid the lookup cost when using a connected socket.

> 
> The patch is to reduce the copying operations between user space and
> the kernel. I might miss something in your user space code, but the
> msg to send is *already packed* into the Stream Frame in user space,
> what's the difference if you encrypt it in userspace and then
> sendmsg(udp_sk) with zero-copy to the kernel.
I would not say that reducing copy operations is the primary goal of this
work. There are already ways to achieve minimal copy operations for UDP from
userspace. 
> 
> Didn't really understand the "GSO" you mentioned, as I don't see any
> code about kernel GSO, I guess it's just "Fragment size", right?
> BTW, it‘s not common to use "//" for the kernel annotation.
> 
> I'm not sure if it's worth adding a ULP layer over UDP for this QUIC
> TX only. Honestly, I'm more supporting doing a full QUIC stack in the
> kernel independently with socket APIs to use it:
> https://github.com/lxin/tls_hs.
A full QUIC stack in the kernel with associated socket APIs is solving a
different problem than this work. Having an API to offload crypto operations of QUIC
allows for the choice of many different QUIC implementations in userspace while
potentially taking advantage of offloading the main CPU cost of an encrypted protocol.
> 
> Thanks.
> 

Best,
Matt Joras

Powered by blists - more mailing lists