[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CADvbK_fmU=0GeB-32uv=+MOxHA8ZXjQeysqF1=z1f0eBt7pgTg@mail.gmail.com>
Date: Fri, 16 Jan 2026 14:55:53 -0500
From: Xin Long <lucien.xin@...il.com>
To: Stefan Metzmacher <metze@...ba.org>
Cc: network dev <netdev@...r.kernel.org>, quic@...ts.linux.dev, davem@...emloft.net,
kuba@...nel.org, Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
Simon Horman <horms@...nel.org>, Moritz Buhl <mbuhl@...nbsd.org>,
Tyler Fanelli <tfanelli@...hat.com>, Pengtao He <hepengtao@...omi.com>,
Thomas Dreibholz <dreibh@...ula.no>, linux-cifs@...r.kernel.org,
Steve French <smfrench@...il.com>, Namjae Jeon <linkinjeon@...nel.org>,
Paulo Alcantara <pc@...guebit.com>, Tom Talpey <tom@...pey.com>, kernel-tls-handshake@...ts.linux.dev,
Chuck Lever <chuck.lever@...cle.com>, Jeff Layton <jlayton@...nel.org>,
Steve Dickson <steved@...hat.com>, Hannes Reinecke <hare@...e.de>, Alexander Aring <aahringo@...hat.com>,
David Howells <dhowells@...hat.com>, Matthieu Baerts <matttbe@...nel.org>,
John Ericson <mail@...nericson.me>, Cong Wang <xiyou.wangcong@...il.com>,
"D . Wythe" <alibuda@...ux.alibaba.com>, Jason Baron <jbaron@...mai.com>,
illiliti <illiliti@...tonmail.com>, Sabrina Dubroca <sd@...asysnail.net>,
Marcelo Ricardo Leitner <marcelo.leitner@...il.com>, Daniel Stenberg <daniel@...x.se>,
Andy Gospodarek <andrew.gospodarek@...adcom.com>
Subject: Re: [PATCH net-next v7 16/16] quic: add packet parser base
On Fri, Jan 16, 2026 at 11:20 AM Stefan Metzmacher <metze@...ba.org> wrote:
>
> Am 15.01.26 um 16:11 schrieb Xin Long:
> > This patch usess 'quic_packet' to handle packing of QUIC packets on the
> > receive (RX) path.
> >
> > It introduces mechanisms to parse the ALPN from client Initial packets
> > to determine the correct listener socket. Received packets are then
> > routed and processed accordingly. Similar to the TX path, handling for
> > application and handshake packets is not yet implemented.
> >
> > - quic_packet_parse_alpn()`: Parse the ALPN from a client Initial packet,
> > then locate the appropriate listener using the ALPN.
> >
> > - quic_packet_rcv(): Locate the appropriate socket to handle the packet
> > via quic_packet_process().
> >
> > - quic_packet_process()`: Process the received packet.
> >
> > In addition to packet flow, this patch adds support for ICMP-based MTU
> > updates by locating the relevant socket and updating the stored PMTU
> > accordingly.
> >
> > - quic_packet_rcv_err_pmtu(): Find the socket and update the PMTU via
> > quic_packet_mss_update().
> >
> > Signed-off-by: Xin Long <lucien.xin@...il.com>
> > ---
> > v5:
> > - In quic_packet_rcv_err(), remove the unnecessary quic_is_listen()
> > check and move quic_get_mtu_info() out of sock lock (suggested
> > by Paolo).
> > - Replace cancel_work_sync() to disable_work_sync() (suggested by
> > Paolo).
> > v6:
> > - Fix the loop using skb_dequeue() in quic_packet_backlog_work(), and
> > kfree_skb() when sk is not found (reported by AI Reviews).
> > - Remove skb_pull() from quic_packet_rcv(), since it is now handled
> > in quic_path_rcv().
> > - Note for AI reviews: add if (dst) check in quic_packet_rcv_err_pmtu(),
> > although quic_packet_route() >= 0 already guarantees it is not NULL.
> > - Note for AI reviews: it is safe to do *plen -= QUIC_HLEN in
> > quic_packet_get_version_and_connid(), since quic_packet_get_sock()
> > already checks if (skb->len < QUIC_HLEN).
> > - Note for AI reviews: cb->length - cb->number_len - QUIC_TAG_LEN
> > cannot underflow, because quic_crypto_header_decrypt() already checks
> > if (cb->length < QUIC_PN_MAX_LEN + QUIC_SAMPLE_LEN).
> > - Note for AI reviews: the cast (u16)length in quic_packet_parse_alpn()
> > is safe, as there is a prior check if (length > (u16)len); len is
> > skb->len, which cannot exceed U16_MAX.
> > - Note for AI reviews: it's correct to do if (flags &
> > QUIC_F_MTU_REDUCED_DEFERRED) in quic_release_cb(), since
> > QUIC_MTU_REDUCED_DEFERRED is the bit used with test_and_set_bit().
> > - Note for AI reviews: move skb_cb->backlog = 1 before adding skb to
> > backlog, although it's safe to write skb_cb after adding to backlog
> > with sk_lock.slock, as skb dequeue from backlog requires sk_lock.slock.
> > v7:
> > - Pass udp sk to quic_packet_rcv(), quic_packet_rcv_err() and
> > quic_sock_lookup().
> > - Move the call to skb_linearize() and skb_set_owner_sk_safe() to
> > .quic_path_rcv()/quic_packet_rcv().
> > ---
> > net/quic/packet.c | 644 ++++++++++++++++++++++++++++++++++++++++++++
> > net/quic/packet.h | 9 +
> > net/quic/protocol.c | 6 +
> > net/quic/protocol.h | 4 +
> > net/quic/socket.c | 134 +++++++++
> > net/quic/socket.h | 5 +
> > 6 files changed, 802 insertions(+)
> >
> > diff --git a/net/quic/packet.c b/net/quic/packet.c
> > index 348e760aa197..415eda603355 100644
> > --- a/net/quic/packet.c
> > +++ b/net/quic/packet.c
> > @@ -14,6 +14,650 @@
> >
> > #define QUIC_HLEN 1
> >
> > +#define QUIC_LONG_HLEN(dcid, scid) \
> > + (QUIC_HLEN + QUIC_VERSION_LEN + 1 + (dcid)->len + 1 + (scid)->len)
> > +
> > +#define QUIC_VERSION_NUM 2
> > +
> > +/* Supported QUIC versions and their compatible versions. Used for Compatible Version
> > + * Negotiation in rfc9368#section-2.3.
> > + */
> > +static u32 quic_versions[QUIC_VERSION_NUM][4] = {
> > + /* Version, Compatible Versions */
> > + { QUIC_VERSION_V1, QUIC_VERSION_V2, QUIC_VERSION_V1, 0 },
> > + { QUIC_VERSION_V2, QUIC_VERSION_V2, QUIC_VERSION_V1, 0 },
> > +};
> > +
> > +/* Get the compatible version list for a given QUIC version. */
> > +u32 *quic_packet_compatible_versions(u32 version)
> > +{
> > + u8 i;
> > +
> > + for (i = 0; i < QUIC_VERSION_NUM; i++)
> > + if (version == quic_versions[i][0])
> > + return quic_versions[i];
> > + return NULL;
> > +}
> > +
> > +/* Convert version-specific type to internal standard packet type. */
> > +static u8 quic_packet_version_get_type(u32 version, u8 type)
> > +{
> > + if (version == QUIC_VERSION_V1)
> > + return type;
> > +
> > + switch (type) {
> > + case QUIC_PACKET_INITIAL_V2:
> > + return QUIC_PACKET_INITIAL;
> > + case QUIC_PACKET_0RTT_V2:
> > + return QUIC_PACKET_0RTT;
> > + case QUIC_PACKET_HANDSHAKE_V2:
> > + return QUIC_PACKET_HANDSHAKE;
> > + case QUIC_PACKET_RETRY_V2:
> > + return QUIC_PACKET_RETRY;
> > + default:
> > + return -1;
> > + }
> > + return -1;
> > +}
> > +
> > +/* Parse QUIC version and connection IDs (DCID and SCID) from a Long header packet buffer. */
> > +static int quic_packet_get_version_and_connid(struct quic_conn_id *dcid, struct quic_conn_id *scid,
> > + u32 *version, u8 **pp, u32 *plen)
> > +{
> > + u64 len, v;
> > +
> > + *pp += QUIC_HLEN;
> > + *plen -= QUIC_HLEN;
> > +
> > + if (!quic_get_int(pp, plen, &v, QUIC_VERSION_LEN))
> > + return -EINVAL;
> > + *version = v;
> > +
> > + if (!quic_get_int(pp, plen, &len, 1) ||
> > + len > *plen || len > QUIC_CONN_ID_MAX_LEN)
> > + return -EINVAL;
> > + quic_conn_id_update(dcid, *pp, len);
> > + *plen -= len;
> > + *pp += len;
> > +
> > + if (!quic_get_int(pp, plen, &len, 1) ||
> > + len > *plen || len > QUIC_CONN_ID_MAX_LEN)
> > + return -EINVAL;
> > + quic_conn_id_update(scid, *pp, len);
> > + *plen -= len;
> > + *pp += len;
> > + return 0;
> > +}
> > +
> > +/* Change the QUIC version for the connection.
> > + *
> > + * Frees existing initial crypto keys and installs new initial keys compatible with the new
> > + * version.
> > + */
> > +static int quic_packet_version_change(struct sock *sk, struct quic_conn_id *dcid, u32 version)
> > +{
> > + struct quic_crypto *crypto = quic_crypto(sk, QUIC_CRYPTO_INITIAL);
> > +
> > + if (quic_crypto_initial_keys_install(crypto, dcid, version, quic_is_serv(sk)))
> > + return -1;
> > +
> > + quic_packet(sk)->version = version;
> > + return 0;
> > +}
> > +
> > +/* Select the best compatible QUIC version from offered list.
> > + *
> > + * Considers the local preferred version, currently chosen version, and versions offered by
> > + * the peer. Selects the best compatible version based on client/server role and updates the
> > + * connection version accordingly.
> > + */
> > +int quic_packet_select_version(struct sock *sk, u32 *versions, u8 count)
> > +{
> > + struct quic_packet *packet = quic_packet(sk);
> > + struct quic_config *c = quic_config(sk);
> > + u8 i, pref_found = 0, ch_found = 0;
> > + u32 preferred, chosen, best = 0;
> > +
> > + preferred = c->version ?: QUIC_VERSION_V1;
> > + chosen = packet->version;
> > +
> > + for (i = 0; i < count; i++) {
> > + if (!quic_packet_compatible_versions(versions[i]))
> > + continue;
> > + if (preferred == versions[i])
> > + pref_found = 1;
> > + if (chosen == versions[i])
> > + ch_found = 1;
> > + if (best < versions[i]) /* Track highest offered version. */
> > + best = versions[i];
> > + }
> > +
> > + if (!pref_found && !ch_found && !best)
> > + return -1;
> > +
> > + if (quic_is_serv(sk)) { /* Server prefers preferred version if offered, else chosen. */
> > + if (pref_found)
> > + best = preferred;
> > + else if (ch_found)
> > + best = chosen;
> > + } else { /* Client prefers chosen version, else preferred. */
> > + if (ch_found)
> > + best = chosen;
> > + else if (pref_found)
> > + best = preferred;
> > + }
> > +
> > + if (packet->version == best)
> > + return 0;
> > +
> > + /* Change to selected best version. */
> > + return quic_packet_version_change(sk, &quic_paths(sk)->orig_dcid, best);
> > +}
> > +
> > +/* Extracts a QUIC token from a buffer in the Client Initial packet. */
> > +static int quic_packet_get_token(struct quic_data *token, u8 **pp, u32 *plen)
> > +{
> > + u64 len;
> > +
> > + if (!quic_get_var(pp, plen, &len) || len > *plen)
> > + return -EINVAL;
> > + quic_data(token, *pp, len);
> > + *plen -= len;
> > + *pp += len;
> > + return 0;
> > +}
> > +
> > +/* Process PMTU reduction event on a QUIC socket. */
> > +void quic_packet_rcv_err_pmtu(struct sock *sk)
> > +{
> > + struct quic_path_group *paths = quic_paths(sk);
> > + struct quic_packet *packet = quic_packet(sk);
> > + struct quic_config *c = quic_config(sk);
> > + u32 pathmtu, info, taglen;
> > + struct dst_entry *dst;
> > + bool reset_timer;
> > +
> > + if (!ip_sk_accept_pmtu(sk))
> > + return;
> > +
> > + info = clamp(paths->mtu_info, QUIC_PATH_MIN_PMTU, QUIC_PATH_MAX_PMTU);
> > + /* If PLPMTUD is not enabled, update MSS using the route and ICMP info. */
> > + if (!c->plpmtud_probe_interval) {
> > + if (quic_packet_route(sk) < 0)
> > + return;
> > +
> > + dst = __sk_dst_get(sk);
> > + if (dst)
> > + dst->ops->update_pmtu(dst, sk, NULL, info, true);
> > + quic_packet_mss_update(sk, info - packet->hlen);
> > + return;
> > + }
> > + /* PLPMTUD is enabled: adjust to smaller PMTU, subtract headers and AEAD tag. Also
> > + * notify the QUIC path layer for possible state changes and probing.
> > + */
> > + taglen = quic_packet_taglen(packet);
> > + info = info - packet->hlen - taglen;
> > + pathmtu = quic_path_pl_toobig(paths, info, &reset_timer);
> > + if (reset_timer)
> > + quic_timer_reset(sk, QUIC_TIMER_PMTU, c->plpmtud_probe_interval);
> > + if (pathmtu)
> > + quic_packet_mss_update(sk, pathmtu + taglen);
> > +}
> > +
> > +/* Handle ICMP Toobig packet and update QUIC socket path MTU. */
> > +static int quic_packet_rcv_err(struct sock *sk, struct sk_buff *skb)
> > +{
> > + union quic_addr daddr, saddr;
> > + u32 info;
> > +
> > + /* All we can do is lookup the matching QUIC socket by addresses. */
> > + quic_get_msg_addrs(skb, &saddr, &daddr);
> > + sk = quic_sock_lookup(skb, &daddr, &saddr, sk, NULL);
> > + if (!sk)
> > + return -ENOENT;
> > +
> > + if (quic_get_mtu_info(skb, &info)) {
> > + sock_put(sk);
> > + return 0;
> > + }
> > +
> > + /* Success: update socket path MTU info. */
> > + bh_lock_sock(sk);
> > + quic_paths(sk)->mtu_info = info;
> > + if (sock_owned_by_user(sk)) {
> > + /* Socket is in use by userspace context. Defer MTU processing to later via
> > + * tasklet. Ensure the socket is not dropped before deferral.
> > + */
> > + if (!test_and_set_bit(QUIC_MTU_REDUCED_DEFERRED, &sk->sk_tsq_flags))
> > + sock_hold(sk);
> > + goto out;
> > + }
> > + /* Otherwise, process the MTU reduction now. */
> > + quic_packet_rcv_err_pmtu(sk);
> > +out:
> > + bh_unlock_sock(sk);
> > + sock_put(sk);
> > + return 1;
> > +}
> > +
> > +#define QUIC_PACKET_BACKLOG_MAX 4096
> > +
> > +/* Queue a packet for later processing when sleeping is allowed. */
> > +static int quic_packet_backlog_schedule(struct net *net, struct sk_buff *skb)
> > +{
> > + struct quic_skb_cb *cb = QUIC_SKB_CB(skb);
> > + struct quic_net *qn = quic_net(net);
> > +
> > + if (cb->backlog)
> > + return 0;
> > +
> > + if (skb_queue_len_lockless(&qn->backlog_list) >= QUIC_PACKET_BACKLOG_MAX) {
> > + QUIC_INC_STATS(net, QUIC_MIB_PKT_RCVDROP);
> > + kfree_skb(skb);
> > + return -1;
> > + }
> > +
> > + cb->backlog = 1;
> > + skb_queue_tail(&qn->backlog_list, skb);
> > + queue_work(quic_wq, &qn->work);
> > + return 1;
> > +}
> > +
> > +#define TLS_MT_CLIENT_HELLO 1
> > +#define TLS_EXT_alpn 16
> > +
> > +/* TLS Client Hello Msg:
> > + *
> > + * uint16 ProtocolVersion;
> > + * opaque Random[32];
> > + * uint8 CipherSuite[2];
> > + *
> > + * struct {
> > + * ExtensionType extension_type;
> > + * opaque extension_data<0..2^16-1>;
> > + * } Extension;
> > + *
> > + * struct {
> > + * ProtocolVersion legacy_version = 0x0303;
> > + * Random rand;
> > + * opaque legacy_session_id<0..32>;
> > + * CipherSuite cipher_suites<2..2^16-2>;
> > + * opaque legacy_compression_methods<1..2^8-1>;
> > + * Extension extensions<8..2^16-1>;
> > + * } ClientHello;
> > + */
> > +
> > +#define TLS_CH_RANDOM_LEN 32
> > +#define TLS_CH_VERSION_LEN 2
> > +
> > +/* Extract ALPN data from a TLS ClientHello message.
> > + *
> > + * Parses the TLS ClientHello handshake message to find the ALPN (Application Layer Protocol
> > + * Negotiation) TLS extension. It validates the TLS ClientHello structure, including version,
> > + * random, session ID, cipher suites, compression methods, and extensions. Once the ALPN
> > + * extension is found, the ALPN protocols list is extracted and stored in @alpn.
> > + *
> > + * Return: 0 on success or no ALPN found, a negative error code on failed parsing.
> > + */
> > +static int quic_packet_get_alpn(struct quic_data *alpn, u8 *p, u32 len)
> > +{
> > + int err = -EINVAL, found = 0;
> > + u64 length, type;
> > +
> > + /* Verify handshake message type (ClientHello) and its length. */
> > + if (!quic_get_int(&p, &len, &type, 1) || type != TLS_MT_CLIENT_HELLO)
> > + return err;
> > + if (!quic_get_int(&p, &len, &length, 3) ||
> > + length < TLS_CH_RANDOM_LEN + TLS_CH_VERSION_LEN)
> > + return err;
> > + if (len > (u32)length) /* Limit len to handshake message length if larger. */
> > + len = length;
> > + /* Skip legacy_version (2 bytes) + random (32 bytes). */
> > + p += TLS_CH_RANDOM_LEN + TLS_CH_VERSION_LEN;
> > + len -= TLS_CH_RANDOM_LEN + TLS_CH_VERSION_LEN;
> > + /* legacy_session_id_len must be zero (QUIC requirement). */
> > + if (!quic_get_int(&p, &len, &length, 1) || length)
> > + return err;
> > +
> > + /* Skip cipher_suites (2 bytes length + variable data). */
> > + if (!quic_get_int(&p, &len, &length, 2) || length > (u64)len)
> > + return err;
> > + len -= length;
> > + p += length;
> > +
> > + /* Skip legacy_compression_methods (1 byte length + variable data). */
> > + if (!quic_get_int(&p, &len, &length, 1) || length > (u64)len)
> > + return err;
> > + len -= length;
> > + p += length;
> > +
> > + if (!quic_get_int(&p, &len, &length, 2)) /* Read TLS extensions length (2 bytes). */
> > + return err;
> > + if (len > (u32)length) /* Limit len to extensions length if larger. */
> > + len = length;
> > + while (len > 4) { /* Iterate over extensions to find ALPN (type TLS_EXT_alpn). */
> > + if (!quic_get_int(&p, &len, &type, 2))
> > + break;
> > + if (!quic_get_int(&p, &len, &length, 2))
> > + break;
> > + if (len < (u32)length) /* Incomplete TLS extensions. */
> > + return 0;
> > + if (type == TLS_EXT_alpn) { /* Found ALPN extension. */
> > + len = length;
> > + found = 1;
> > + break;
> > + }
> > + /* Skip non-ALPN extensions. */
> > + p += length;
> > + len -= length;
> > + }
> > + if (!found) { /* no ALPN extension found: set alpn->len = 0 and alpn->data = p. */
> > + quic_data(alpn, p, 0);
> > + return 0;
> > + }
> > +
> > + /* Parse ALPN protocols list length (2 bytes). */
> > + if (!quic_get_int(&p, &len, &length, 2) || length > (u64)len)
> > + return err;
> > + quic_data(alpn, p, length); /* Store ALPN protocols list in alpn->data. */
> > + len = length;
> > + while (len) { /* Validate ALPN protocols list format. */
> > + if (!quic_get_int(&p, &len, &length, 1) || length > (u64)len) {
> > + /* Malformed ALPN entry: set alpn->len = 0 and alpn->data = NULL. */
> > + quic_data(alpn, NULL, 0);
> > + return err;
> > + }
> > + len -= length;
> > + p += length;
> > + }
> > + pr_debug("%s: alpn_len: %d\n", __func__, alpn->len);
> > + return 0;
> > +}
> > +
> > +/* Parse ALPN from a QUIC Initial packet.
> > + *
> > + * This function processes a QUIC Initial packet to extract the ALPN from the TLS ClientHello
> > + * message inside the QUIC CRYPTO frame. It verifies packet type, version compatibility,
> > + * decrypts the packet payload, and locates the CRYPTO frame to parse the TLS ClientHello.
> > + * Finally, it calls quic_packet_get_alpn() to extract the ALPN extension data.
> > + *
> > + * Return: 0 on success or no ALPN found, a negative error code on failed parsing.
> > + */
> > +static int quic_packet_parse_alpn(struct sk_buff *skb, struct quic_data *alpn)
> > +{
> > + struct quic_skb_cb *cb = QUIC_SKB_CB(skb);
> > + struct net *net = sock_net(skb->sk);
> > + u8 *p = skb->data, *data, type;
> > + struct quic_conn_id dcid, scid;
> > + u32 len = skb->len, version;
> > + struct quic_crypto *crypto;
> > + struct quic_data token;
> > + u64 offset, length;
> > + int err = -EINVAL;
> > +
> > + if (!sysctl_quic_alpn_demux)
> > + return 0;
>
> Can this be made dynamic, turning it on if someone
> listens on a socket with QUIC_SOCKOPT_ALPN set?
>
> Otherwise I guess it silently doesn't work
> and needs administrator interaction.
>
Makes sense to me.
I will replace this with a static_key when adding QUIC_SOCKOPT_ALPN
socket options in patchset-2:
if (!static_branch_unlikely(&quic_alpn_demux_key))
return 0;
static_branch_inc() and static_branch_dec() it in quic_hash() and
quic_unhash() if alpn is set for listening sockets.
Thanks.
Powered by blists - more mailing lists