[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a7b64f16-5ca9-4344-b7e8-c0d4508e43cc@redhat.com>
Date: Thu, 29 Jan 2026 17:20:31 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Xin Long <lucien.xin@...il.com>, network dev <netdev@...r.kernel.org>,
quic@...ts.linux.dev
Cc: davem@...emloft.net, kuba@...nel.org, Eric Dumazet <edumazet@...gle.com>,
Simon Horman <horms@...nel.org>, Stefan Metzmacher <metze@...ba.org>,
Moritz Buhl <mbuhl@...nbsd.org>, Tyler Fanelli <tfanelli@...hat.com>,
Pengtao He <hepengtao@...omi.com>, Thomas Dreibholz <dreibh@...ula.no>,
linux-cifs@...r.kernel.org, Steve French <smfrench@...il.com>,
Namjae Jeon <linkinjeon@...nel.org>, Paulo Alcantara <pc@...guebit.com>,
Tom Talpey <tom@...pey.com>, kernel-tls-handshake@...ts.linux.dev,
Chuck Lever <chuck.lever@...cle.com>, Jeff Layton <jlayton@...nel.org>,
Steve Dickson <steved@...hat.com>, Hannes Reinecke <hare@...e.de>,
Alexander Aring <aahringo@...hat.com>, David Howells <dhowells@...hat.com>,
Matthieu Baerts <matttbe@...nel.org>, John Ericson <mail@...nericson.me>,
Cong Wang <xiyou.wangcong@...il.com>, "D . Wythe"
<alibuda@...ux.alibaba.com>, Jason Baron <jbaron@...mai.com>,
illiliti <illiliti@...tonmail.com>, Sabrina Dubroca <sd@...asysnail.net>,
Marcelo Ricardo Leitner <marcelo.leitner@...il.com>,
Daniel Stenberg <daniel@...x.se>,
Andy Gospodarek <andrew.gospodarek@...adcom.com>
Subject: Re: [PATCH net-next v8 08/15] quic: add path management
On 1/26/26 3:51 PM, Xin Long wrote:
> This patch introduces 'quic_path_group' for managing paths, represented
> by 'struct quic_path'. A connection may use two paths simultaneously
> for connection migration.
>
> Each path is associated with a UDP tunnel socket (sk), and a single
> UDP tunnel socket can be related to multiple paths from different sockets.
> These UDP tunnel sockets are wrapped in 'quic_udp_sock' structures and
> stored in a hash table.
>
> It includes mechanisms to bind and unbind paths, detect alternative paths
> for migration, and swap paths to support seamless transition between
> networks.
>
> - quic_path_bind(): Bind a path to a port and associate it with a UDP sk.
>
> - quic_path_unbind(): Unbind a path from a port and disassociate it from a
> UDP sk.
>
> - quic_path_swap(): Swap two paths to facilitate connection migration.
>
> - quic_path_detect_alt(): Determine if a packet is using an alternative
> path, used for connection migration.
>
> It also integrates basic support for Packetization Layer Path MTU
> Discovery (PLPMTUD), using PING frames and ICMP feedback to adjust path
> MTU and handle probe confirmation or resets during routing changes.
>
> - quic_path_pl_recv(): state transition and pmtu update after the probe
> packet is acked.
>
> - quic_path_pl_toobig(): state transition and pmtu update after
> receiving a toobig or needfrag icmp packet.
>
> - quic_path_pl_send(): state transition and pmtu update after sending a
> probe packet.
>
> - quic_path_pl_reset(): restart the probing when path routing changes.
>
> - quic_path_pl_confirm(): check if probe packet gets acked.
>
> Signed-off-by: Tyler Fanelli <tfanelli@...hat.com>
> Signed-off-by: Xin Long <lucien.xin@...il.com>
> ---
> v3:
> - Fix annotation in quic_udp_sock_lookup() (noted by Paolo).
> - Use inet_sk_get_local_port_range() instead of
> inet_get_local_port_range() (suggested by Paolo).
> - Adjust global UDP tunnel socket hashtable operations for the new
> hashtable type.
> - Delete quic_workqueue; use system_wq for UDP tunnel socket destroy.
> v4:
> - Cache UDP tunnel socket pointer and its source address in struct
> quic_path for RCU-protected lookup/access.
> - Return -EAGAIN instead of -EINVAL in quic_path_bind() when UDP
> socket is being released in workqueue.
> - Move udp_tunnel_sock_release() out of the mutex_lock to avoid a
> warning of lockdep in quic_udp_sock_put_work().
> - Introduce quic_wq for UDP socket release work, so all pending works
> can be flushed before destroying the hashtable in quic_exit().
> v5:
> - Rename quic_path_free() to quic_path_unbind() (suggested by Paolo).
> - Remove the 'serv' member from struct quic_path_group, since
> quic_is_serv() defined in a previous patch now uses
> sk->sk_max_ack_backlog for server-side detection.
> - Use quic_ktime_get_us() to set skb_cb->time, as RTT is measured
> in microseconds and jiffies_to_usecs() is not accurate enough.
> v6:
> - Do not reset transport_header for QUIC in quic_udp_rcv(), allowing
> removal of udph_offset and enabling access to the UDP header via
> udp_hdr(); Pull skb->data in quic_udp_rcv() to allow access to the
> QUIC header via skb->data.
> v7:
> - Pass udp sk to quic_path_rcv() and move the call to skb_linearize()
> and skb_set_owner_sk_safe() to .quic_path_rcv().
> - Delete the call to skb_linearize() and skb_set_owner_sk_safe() from
> quic_udp_err(), as it should not change skb in .encap_err_lookup()
> (noted by AI review).
> v8:
> - Remove indirect quic_path_rcv and late call quic_packet_rcv()
> directly via extern (noted by Paolo).
> - Add a comment in quic_udp_rcv() clarifying it must return 0.
> - Add a comment in quic_udp_sock_put() clarifying the UDP socket
> may be freed in atomic RX context during connection migration.
> - Reorder some quic_path_group members to reduce struct size.
> ---
> net/quic/Makefile | 2 +-
> net/quic/path.c | 520 ++++++++++++++++++++++++++++++++++++++++++++
> net/quic/path.h | 170 +++++++++++++++
> net/quic/protocol.c | 11 +
> net/quic/socket.c | 3 +
> net/quic/socket.h | 7 +
> 6 files changed, 712 insertions(+), 1 deletion(-)
> create mode 100644 net/quic/path.c
> create mode 100644 net/quic/path.h
>
> diff --git a/net/quic/Makefile b/net/quic/Makefile
> index eee7501588d3..1565fb5cef9d 100644
> --- a/net/quic/Makefile
> +++ b/net/quic/Makefile
> @@ -5,4 +5,4 @@
>
> obj-$(CONFIG_IP_QUIC) += quic.o
>
> -quic-y := common.o family.o protocol.o socket.o stream.o connid.o
> +quic-y := common.o family.o protocol.o socket.o stream.o connid.o path.o
> diff --git a/net/quic/path.c b/net/quic/path.c
> new file mode 100644
> index 000000000000..9556607a009e
> --- /dev/null
> +++ b/net/quic/path.c
> @@ -0,0 +1,520 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/* QUIC kernel implementation
> + * (C) Copyright Red Hat Corp. 2023
> + *
> + * This file is part of the QUIC kernel implementation
> + *
> + * Initialization/cleanup for QUIC protocol support.
> + *
> + * Written or modified by:
> + * Xin Long <lucien.xin@...il.com>
> + */
> +
> +#include <net/udp_tunnel.h>
> +#include <linux/quic.h>
> +
> +#include "common.h"
> +#include "family.h"
> +#include "path.h"
> +
> +static int quic_udp_rcv(struct sock *sk, struct sk_buff *skb)
> +{
> + memset(skb->cb, 0, sizeof(skb->cb));
> + QUIC_SKB_CB(skb)->seqno = -1;
> + QUIC_SKB_CB(skb)->time = quic_ktime_get_us();
> +
> + skb_pull(skb, sizeof(struct udphdr));
> + skb_dst_force(skb);
> + kfree_skb(skb);
> + return 0; /* .encap_rcv must return 0 if skb was either consumed or dropped. */
> +}
> +
> +static int quic_udp_err(struct sock *sk, struct sk_buff *skb)
> +{
> + return 0;
> +}
> +
> +static void quic_udp_sock_put_work(struct work_struct *work)
> +{
> + struct quic_udp_sock *us = container_of(work, struct quic_udp_sock, work);
> + struct quic_uhash_head *head;
> + struct sock *sk = us->sk;
> +
> + /* Hold the sock to safely access it in quic_udp_sock_lookup() even after
> + * udp_tunnel_sock_release(). The release must occur before __hlist_del()
> + * so a new UDP tunnel socket can be created for the same address and port
> + * if quic_udp_sock_lookup() fails to find one.
> + *
> + * Note: udp_tunnel_sock_release() cannot be called under the mutex due to
> + * some lockdep warnings.
> + */
> + sock_hold(sk);
> + udp_tunnel_sock_release(sk->sk_socket);
> +
> + head = quic_udp_sock_head(sock_net(sk), ntohs(us->addr.v4.sin_port));
> + mutex_lock(&head->lock);
> + __hlist_del(&us->node);
> + mutex_unlock(&head->lock);
> +
> + sock_put(sk);
> + kfree(us);
> +}
> +
> +static struct quic_udp_sock *quic_udp_sock_create(struct sock *sk, union quic_addr *a)
> +{
> + struct udp_tunnel_sock_cfg tuncfg = {};
> + struct udp_port_cfg udp_conf = {};
> + struct net *net = sock_net(sk);
> + struct quic_uhash_head *head;
> + struct quic_udp_sock *us;
> + struct socket *sock;
> +
> + us = kzalloc(sizeof(*us), GFP_KERNEL);
> + if (!us)
> + return NULL;
> +
> + quic_udp_conf_init(sk, &udp_conf, a);
> + if (udp_sock_create(net, &udp_conf, &sock)) {
> + pr_debug("%s: failed to create udp sock\n", __func__);
> + kfree(us);
> + return NULL;
> + }
> +
> + tuncfg.encap_type = 1;
> + tuncfg.encap_rcv = quic_udp_rcv;
> + tuncfg.encap_err_lookup = quic_udp_err;
> + setup_udp_tunnel_sock(net, sock, &tuncfg);
Possibly you need to adjust UDP_MAX_TUNNEL_TYPES in udp_offload.c. You
could check running a kernel with QUIC enabled and geneve, vxlan, FOU
and xfrm disabled.
> +
> + refcount_set(&us->refcnt, 1);
> + us->sk = sock->sk;
> + memcpy(&us->addr, a, sizeof(*a));
> + us->bind_ifindex = sk->sk_bound_dev_if;
> +
> + head = quic_udp_sock_head(net, ntohs(a->v4.sin_port));
> + hlist_add_head(&us->node, &head->head);
> + INIT_WORK(&us->work, quic_udp_sock_put_work);
Is unclear to me if quick udp socket lookup be done locklessy with
future series?
/P
Powered by blists - more mailing lists