[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <cover.1751743914.git.lucien.xin@gmail.com>
Date: Sat, 5 Jul 2025 15:31:39 -0400
From: Xin Long <lucien.xin@...il.com>
To: network dev <netdev@...r.kernel.org>
Cc: davem@...emloft.net,
kuba@...nel.org,
Eric Dumazet <edumazet@...gle.com>,
Paolo Abeni <pabeni@...hat.com>,
Simon Horman <horms@...nel.org>,
Stefan Metzmacher <metze@...ba.org>,
Moritz Buhl <mbuhl@...nbsd.org>,
Tyler Fanelli <tfanelli@...hat.com>,
Pengtao He <hepengtao@...omi.com>,
linux-cifs@...r.kernel.org,
Steve French <smfrench@...il.com>,
Namjae Jeon <linkinjeon@...nel.org>,
Paulo Alcantara <pc@...guebit.com>,
Tom Talpey <tom@...pey.com>,
kernel-tls-handshake@...ts.linux.dev,
Chuck Lever <chuck.lever@...cle.com>,
Jeff Layton <jlayton@...nel.org>,
Benjamin Coddington <bcodding@...hat.com>,
Steve Dickson <steved@...hat.com>,
Hannes Reinecke <hare@...e.de>,
Alexander Aring <aahringo@...hat.com>,
Cong Wang <xiyou.wangcong@...il.com>,
"D . Wythe" <alibuda@...ux.alibaba.com>,
Jason Baron <jbaron@...mai.com>,
illiliti <illiliti@...tonmail.com>,
Sabrina Dubroca <sd@...asysnail.net>,
Marcelo Ricardo Leitner <marcelo.leitner@...il.com>,
Daniel Stenberg <daniel@...x.se>,
Andy Gospodarek <andrew.gospodarek@...adcom.com>
Subject: [PATCH net-next 00/15] net: introduce QUIC infrastructure and core subcomponents
Introduction
============
The QUIC protocol, as defined in RFC9000, offers a UDP-based, secure
transport with flow-controlled streams for efficient communication,
low-latency connection setup, and network path migration, ensuring
confidentiality, integrity, and availability across various deployments.
This implementation introduces QUIC support in Linux Kernel, offering
several key advantages:
- Seamless Integration for Kernel Subsystems: Kernel subsystems such as
SMB and NFS can operate over QUIC seamlessly after the handshake,
leveraging the net/handshake APIs.
- Standardized Socket APIs for QUIC: This implementation standardizes the
socket APIs for QUIC, covering essential operations like listen, accept,
connect, sendmsg, recvmsg, close, get/setsockopt, and getsock/peername().
- Efficient ALPN Routing: It incorporates ALPN routing within the kernel,
efficiently directing incoming requests to the appropriate applications
across different processes based on ALPN.
- Performance Enhancements: By minimizing data duplication through
zero-copy techniques such as sendfile(), and paving the way for crypto
offloading in NICs, this implementation enhances performance and prepares
for future optimizations.
This implementation offers fundamental support for the following RFCs:
- RFC9000 - QUIC: A UDP-Based Multiplexed and Secure Transport
- RFC9001 - Using TLS to Secure QUIC
- RFC9002 - QUIC Loss Detection and Congestion Control
- RFC9221 - An Unreliable Datagram Extension to QUIC
- RFC9287 - Greasing the QUIC Bit
- RFC9368 - Compatible Version Negotiation for QUIC
- RFC9369 - QUIC Version 2
The socket APIs for QUIC follow the RFC draft [1]:
- The Sockets API Extensions for In-kernel QUIC Implementations
Implementation
==============
The core idea is to implement QUIC within the kernel, using a userspace
handshake approach.
Only the processing and creation of raw TLS Handshake Messages are handled
in userspace, facilitated by a TLS library like GnuTLS. These messages are
exchanged between kernel and userspace via sendmsg() and recvmsg(), with
cryptographic details conveyed through control messages (cmsg).
The entire QUIC protocol, aside from the TLS Handshake Messages processing
and creation, is managed within the kernel. Rather than using a Upper Layer
Protocol (ULP) layer, this implementation establishes a socket of type
IPPROTO_QUIC (similar to IPPROTO_MPTCP), operating over UDP tunnels.
For kernel consumers, they can initiate a handshake request from the kernel
to userspace using the existing net/handshake netlink. The userspace
component, such as tlshd service [2], then manages the processing
of the QUIC handshake request.
- Handshake Architecture:
┌──────┐ ┌──────┐
│ APP1 │ │ APP2 │ ...
└──────┘ └──────┘
┌──────────────────────────────────────────┐
│ {quic_client/server_handshake()} │<─────────────┐
└──────────────────────────────────────────┘ ┌─────────────┐
{send/recvmsg()} {set/getsockopt()} │ tlshd │
[CMSG handshake_info] [SOCKOPT_CRYPTO_SECRET] └─────────────┘
[SOCKOPT_TRANSPORT_PARAM_EXT] │ ^
│ ^ │ ^ │ │
Userspace │ │ │ │ │ │
──────────────│─│──────────────────│─│──────────────────│───│───────
Kernel │ │ │ │ │ │
v │ v │ v │
┌──────────────────┬───────────────────────┐ ┌─────────────┐
│ protocol, timer, │ socket (IPPROTO_QUIC) │<──┐ │ handshake │
│ ├───────────────────────┤ │ │netlink APIs │
│ common, family, │ outqueue | inqueue │ │ └─────────────┘
│ ├───────────────────────┤ │ │ │
│ stream, connid, │ frame │ │ ┌─────┐ ┌─────┐
│ ├───────────────────────┤ │ │ │ │ │
│ path, pnspace, │ packet │ │───│ SMB │ │ NFS │...
│ ├───────────────────────┤ │ │ │ │ │
│ cong, crypto │ UDP tunnels │ │ └─────┘ └─────┘
└──────────────────┴───────────────────────┘ └──────┴───────┘
- User Data Architecture:
┌──────┐ ┌──────┐
│ APP1 │ │ APP2 │ ...
└──────┘ └──────┘
{send/recvmsg()} {set/getsockopt()} {recvmsg()}
[CMSG stream_info] [SOCKOPT_KEY_UPDATE] [EVENT conn update]
[SOCKOPT_CONNECTION_MIGRATION] [EVENT stream update]
[SOCKOPT_STREAM_OPEN/RESET/STOP]
│ ^ │ ^ ^
Userspace │ │ │ │ │
──────────────│─│───────────────│─│─────────────────────│───────────
Kernel │ │ │ │ │
v │ v │ ┌──────────────────┘
┌──────────────────┬───────────────────────┐
│ protocol, timer, │ socket (IPPROTO_QUIC) │<──┐{kernel_send/recvmsg()}
│ ├───────────────────────┤ │{kernel_set/getsockopt()}
│ common, family, │ outqueue | inqueue │ │{kernel_recvmsg()}
│ ├───────────────────────┤ │
│ stream, connid, │ frame │ │ ┌─────┐ ┌─────┐
│ ├───────────────────────┤ │ │ │ │ │
│ path, pnspace, │ packet │ │───│ SMB │ │ NFS │...
│ ├───────────────────────┤ │ │ │ │ │
│ cong, crypto │ UDP tunnels │ │ └─────┘ └─────┘
└──────────────────┴───────────────────────┘ └──────┴───────┘
Interface
=========
This implementation supports a mapping of QUIC into sockets APIs. Similar
to TCP and SCTP, a typical Server and Client use the following system call
sequence to communicate:
Client Server
──────────────────────────────────────────────────────────────────────
sockfd = socket(IPPROTO_QUIC) listenfd = socket(IPPROTO_QUIC)
bind(sockfd) bind(listenfd)
listen(listenfd)
connect(sockfd)
quic_client_handshake(sockfd)
sockfd = accecpt(listenfd)
quic_server_handshake(sockfd, cert)
sendmsg(sockfd) recvmsg(sockfd)
close(sockfd) close(sockfd)
close(listenfd)
Please note that quic_client_handshake() and quic_server_handshake()
functions are currently sourced from libquic [3]. These functions are
responsible for receiving and processing the raw TLS handshake messages
until the completion of the handshake process.
For utilization by kernel consumers, it is essential to have tlshd
service [2] installed and running in userspace. This service receives
and manages kernel handshake requests for kernel sockets. In the kernel,
the APIs closely resemble those used in userspace:
Client Server
────────────────────────────────────────────────────────────────────────
__sock_create(IPPROTO_QUIC, &sock) __sock_create(IPPROTO_QUIC, &sock)
kernel_bind(sock) kernel_bind(sock)
kernel_listen(sock)
kernel_connect(sock)
tls_client_hello_x509(args:{sock})
kernel_accept(sock, &newsock)
tls_server_hello_x509(args:{newsock})
kernel_sendmsg(sock) kernel_recvmsg(newsock)
sock_release(sock) sock_release(newsock)
sock_release(sock)
Please be aware that tls_client_hello_x509() and tls_server_hello_x509()
are APIs from net/handshake/. They are used to dispatch the handshake
request to the userspace tlshd service and subsequently block until the
handshake process is completed.
Use Cases
=========
- Samba
Stefan Metzmacher has submitted a merge request to integrate Linux QUIC
into Samba for both client and server roles [4].
- tlshd
The tlshd daemon [2] facilitates Linux QUIC handshake requests from
kernel sockets. This is essential for enabling protocols like SMB
and NFS over QUIC.
- curl
Linux QUIC is being integrated into curl [5] for HTTP/3. Example usage:
# curl --http3-only https://nghttp2.org:4433/
# curl --http3-only https://www.google.com/
# curl --http3-only https://facebook.com/
# curl --http3-only https://outlook.office.com/
# curl --http3-only https://cloudflare-quic.com/
- httpd-portable
Moritz Buhl has deployed an HTTP/3 server over Linux QUIC [6] that is
accessible via Firefox and curl:
https://d.moritzbuhl.de/pub
Test Coverage
=============
The Coverage (gcov) of Functional and Interop Tests:
https://d.moritzbuhl.de/lcov
- Functional Tests
The libquic self-tests (make check) ran on all major architectures:
x86_64, i386, s390x, aarch64, ppc64le.
- Interop tests
Interoperability was validated using the QUIC Interop Runner [7] against
all major userland QUIC stacks. Results are available at:
https://d.moritzbuhl.de/
- Fuzzing via Syzkaller
Syzkaller has been running kernel fuzzing with QUIC for weeks using
tests/syzkaller/ in libquic [3]..
- Performance Testing
Performance was benchmarked using iperf [8] over a 100G NIC with
using various MTUs and packet sizes:
- QUIC vs. kTLS:
UNIT size:1024 size:4096 size:16384 size:65536
Gbits/sec QUIC | kTLS QUIC | kTLS QUIC | kTLS QUIC | kTLS
--------------------------------------------------------------------
mtu:1500 2.27 | 3.26 3.02 | 6.97 3.36 | 9.74 3.48 | 10.8
--------------------------------------------------------------------
mtu:9000 3.66 | 3.72 5.87 | 8.92 7.03 | 11.2 8.04 | 11.4
- QUIC(disable_1rtt_encryption) vs. TCP:
UNIT size:1024 size:4096 size:16384 size:65536
Gbits/sec QUIC | TCP QUIC | TCP QUIC | TCP QUIC | TCP
────────────────────────────────────────────────────────────────────
mtu:1500 3.09 | 4.59 4.46 | 14.2 5.07 | 21.3 5.18 | 23.9
────────────────────────────────────────────────────────────────────
mtu:9000 4.60 | 4.65 8.41 | 14.0 11.3 | 28.9 13.5 | 39.2
The performance gap between QUIC and kTLS may be attributed to:
- The absence of Generic Segmentation Offload (GSO) for QUIC.
- An additional data copy on the transmission (TX) path.
- Extra encryption required for header protection in QUIC.
- A longer header length for the stream data in QUIC.
Patches
=======
Note: This implementation is organized into five parts and submitted across
two patchsets for review. This patchset includes Parts 1–2, while Parts 3–5
will be submitted in a subsequent patchset. For complete series, see [9].
1. Infrastructure (2):
net: define IPPROTO_QUIC and SOL_QUIC constants
net: build socket infrastructure for QUIC protocol
2. Subcomponents (13):
quic: provide common utilities and data structures
quic: provide family ops for address and protocol
quic: provide quic.h header files for kernel and userspace
quic: add stream management
quic: add connection id management
quic: add path management
quic: add congestion control
quic: add packet number space
quic: add crypto key derivation and installation
quic: add crypto packet encryption and decryption
quic: add timer management
quic: add frame encoder and decoder base
quic: add packet builder and parser base
3. Data Processing (7):
quic: implement outqueue transmission and flow control
quic: implement outqueue sack and retransmission
quic: implement inqueue receiving and flow control
quic: implement frame creation functions
quic: implement frame processing functions
quic: implement packet creation functions
quic: implement packet processing functions
4. Socket APIs (6):
quic: support bind/listen/connect/accept/close()
quic: support sendmsg() and recvmsg()
quic: support socket options related to interaction after handshake
quic: support socket options related to settings prior to handshake
quic: support socket options related to setup during handshake
quic: support socket ioctls and socket dump via procfs
5. Example and Documentation (2):
quic: create sample test using handshake APIs for kernel consumers
Documentation: describe QUIC protocol interface in quic.rst
Notice:: The QUIC module is currently labeled as "EXPERIMENTAL".
All contributors are recognized in the respective patches with the tag of
'Signed-off-by:'. Special thanks to Moritz Buhl and Stefan Metzmacher whose
practical use cases and insightful feedback, which have been instrumental
in shaping the design and advancing the development.
References
==========
[1] https://datatracker.ietf.org/doc/html/draft-lxin-quic-socket-apis
[2] https://github.com/oracle/ktls-utils
[3] https://github.com/lxin/quic
[4] https://gitlab.com/samba-team/samba/-/merge_requests/4019
[5] https://github.com/moritzbuhl/curl/tree/linux_curl
[6] https://github.com/moritzbuhl/httpd-portable
[7] https://github.com/quic-interop/quic-interop-runner
[8] https://github.com/lxin/iperf
[9] https://github.com/lxin/net-next/commits/quic/
include/linux/quic.h | 19 +
include/linux/socket.h | 1 +
include/uapi/linux/in.h | 2 +
include/uapi/linux/quic.h | 238 ++++++++
net/Kconfig | 1 +
net/Makefile | 1 +
net/quic/Kconfig | 35 ++
net/quic/Makefile | 9 +
net/quic/common.c | 482 +++++++++++++++
net/quic/common.h | 219 +++++++
net/quic/cong.c | 700 +++++++++++++++++++++
net/quic/cong.h | 120 ++++
net/quic/connid.c | 218 +++++++
net/quic/connid.h | 162 +++++
net/quic/crypto.c | 1201 +++++++++++++++++++++++++++++++++++++
net/quic/crypto.h | 83 +++
net/quic/family.c | 666 ++++++++++++++++++++
net/quic/family.h | 40 ++
net/quic/frame.c | 558 +++++++++++++++++
net/quic/frame.h | 192 ++++++
net/quic/packet.c | 889 +++++++++++++++++++++++++++
net/quic/packet.h | 129 ++++
net/quic/path.c | 507 ++++++++++++++++
net/quic/path.h | 162 +++++
net/quic/pnspace.c | 224 +++++++
net/quic/pnspace.h | 150 +++++
net/quic/protocol.c | 404 +++++++++++++
net/quic/protocol.h | 58 ++
net/quic/socket.c | 424 +++++++++++++
net/quic/socket.h | 221 +++++++
net/quic/stream.c | 549 +++++++++++++++++
net/quic/stream.h | 135 +++++
net/quic/timer.c | 196 ++++++
net/quic/timer.h | 47 ++
34 files changed, 9042 insertions(+)
create mode 100644 include/linux/quic.h
create mode 100644 include/uapi/linux/quic.h
create mode 100644 net/quic/Kconfig
create mode 100644 net/quic/Makefile
create mode 100644 net/quic/common.c
create mode 100644 net/quic/common.h
create mode 100644 net/quic/cong.c
create mode 100644 net/quic/cong.h
create mode 100644 net/quic/connid.c
create mode 100644 net/quic/connid.h
create mode 100644 net/quic/crypto.c
create mode 100644 net/quic/crypto.h
create mode 100644 net/quic/family.c
create mode 100644 net/quic/family.h
create mode 100644 net/quic/frame.c
create mode 100644 net/quic/frame.h
create mode 100644 net/quic/packet.c
create mode 100644 net/quic/packet.h
create mode 100644 net/quic/path.c
create mode 100644 net/quic/path.h
create mode 100644 net/quic/pnspace.c
create mode 100644 net/quic/pnspace.h
create mode 100644 net/quic/protocol.c
create mode 100644 net/quic/protocol.h
create mode 100644 net/quic/socket.c
create mode 100644 net/quic/socket.h
create mode 100644 net/quic/stream.c
create mode 100644 net/quic/stream.h
create mode 100644 net/quic/timer.c
create mode 100644 net/quic/timer.h
--
2.47.1
Powered by blists - more mailing lists