[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADvbK_cQRpyzHG4UUOzfgmqLndvpx5Cd+d59rrqGRp0ic3PyxA@mail.gmail.com>
Date: Fri, 19 Apr 2024 14:09:06 -0400
From: Xin Long <lucien.xin@...il.com>
To: Stefan Metzmacher <metze@...ba.org>
Cc: network dev <netdev@...r.kernel.org>, davem@...emloft.net, kuba@...nel.org,
Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
Steve French <smfrench@...il.com>, Namjae Jeon <linkinjeon@...nel.org>,
Chuck Lever III <chuck.lever@...cle.com>, Jeff Layton <jlayton@...nel.org>,
Sabrina Dubroca <sd@...asysnail.net>, Tyler Fanelli <tfanelli@...hat.com>,
Pengtao He <hepengtao@...omi.com>,
"linux-cifs@...r.kernel.org" <linux-cifs@...r.kernel.org>,
Samba Technical <samba-technical@...ts.samba.org>
Subject: Re: [RFC PATCH net-next 0/5] net: In-kernel QUIC implementation with
Userspace handshake
On Fri, Apr 19, 2024 at 10:07 AM Stefan Metzmacher <metze@...ba.org> wrote:
>
> Hi Xin Long,
>
> >>>>>> first many thanks for working on this topic!
> >>>>>>
> >>>>> Hi, Stefan
> >>>>>
> >>>>> Thanks for the comment!
> >>>>>
> >>>>>>> Usage
> >>>>>>> =====
> >>>>>>>
> >>>>>>> This implementation supports a mapping of QUIC into sockets APIs. Similar
> >>>>>>> to TCP and SCTP, a typical Server and Client use the following system call
> >>>>>>> sequence to communicate:
> >>>>>>>
> >>>>>>> Client Server
> >>>>>>> ------------------------------------------------------------------
> >>>>>>> sockfd = socket(IPPROTO_QUIC) listenfd = socket(IPPROTO_QUIC)
> >>>>>>> bind(sockfd) bind(listenfd)
> >>>>>>> listen(listenfd)
> >>>>>>> connect(sockfd)
> >>>>>>> quic_client_handshake(sockfd)
> >>>>>>> sockfd = accecpt(listenfd)
> >>>>>>> quic_server_handshake(sockfd, cert)
> >>>>>>>
> >>>>>>> sendmsg(sockfd) recvmsg(sockfd)
> >>>>>>> close(sockfd) close(sockfd)
> >>>>>>> close(listenfd)
> >>>>>>>
> >>>>>>> Please note that quic_client_handshake() and quic_server_handshake() functions
> >>>>>>> are currently sourced from libquic in the github lxin/quic repository, and might
> >>>>>>> be integrated into ktls-utils in the future. These functions are responsible for
> >>>>>>> receiving and processing the raw TLS handshake messages until the completion of
> >>>>>>> the handshake process.
> >>>>>>
> >>>>>> I see a problem with this design for the server, as one reason to
> >>>>>> have SMB over QUIC is to use udp port 443 in order to get through
> >>>>>> firewalls. As QUIC has the concept of ALPN it should be possible
> >>>>>> let a conumer only listen on a specif ALPN, so that the smb server
> >>>>>> and web server on "h3" could both accept connections.
> >>>>> We do provide a sockopt to set ALPN before bind or handshaking:
> >>>>>
> >>>>> https://github.com/lxin/quic/wiki/man#quic_sockopt_alpn
> >>>>>
> >>>>> But it's used more like to verify if the ALPN set on the server
> >>>>> matches the one received from the client, instead of to find
> >>>>> the correct server.
> >>>>
> >>>> Ah, ok.
> >>> Just note that, with a bit change in the current libquic, it still
> >>> allows users to use ALPN to find the correct function or thread in
> >>> the *same* process, usage be like:
> >>>
> >>> listenfd = socket(IPPROTO_QUIC);
> >>> /* match all during handshake with wildcard ALPN */
> >>> setsockopt(listenfd, QUIC_SOCKOPT_ALPN, "*");
> >>> bind(listenfd)
> >>> listen(listenfd)
> >>>
> >>> while (1) {
> >>> sockfd = accept(listenfd);
> >>> /* the alpn from client will be set to sockfd during handshake */
> >>> quic_server_handshake(sockfd, cert);
> >>>
> >>> getsockopt(sockfd, QUIC_SOCKOPT_ALPN, alpn);
> >>
> >> Would quic_server_handshake() call setsockopt()?
> > Yes, I just made a bit change in the userspace libquic:
> >
> > https://github.com/lxin/quic/commit/9c75bd42769a8cbc1652e2f4c8d77780f23afde6
> >
> > So you can set up multple ALPNs on listen sock:
> >
> > setsockopt(listenfd, QUIC_SOCKOPT_ALPN, "smbd, h3, ksmbd");
> >
> > Then during handshake, the matched ALPN from client will be set into
> > the accept socket, then users can get it later after handshake.
> >
> > Note that userspace libquic is a very light lib (a couple of hundred lines
> > of code), you can add more TLS related support without touching Kernel code,
> > including the SNI support you mentioned.
> >
> >>
> >>> switch (alpn) {
> >>> case "smbd": smbd_thread(sockfd);
> >>> case "h3": h3_thread(sockfd);
> >>> case "ksmbd": ksmbd_thread(sockfd);
> >>> }
> >>> }
> >>
> >> Ok, but that would mean all application need to be aware of each other,
> >> but it would be possible and socket fds could be passed to other
> >> processes.
> > It doesn't sound common to me, but yes, I think Unix Domain Sockets
> > can pass it to another process.
>
> I think it will be extremely common to have multiple services
> based on udp port 443.
>
> People will expect to find web services, smb and maybe more
> behind the same dnshost name. And multiple dnshostnames pointing
> to the same ip address is also very likely.
>
> With plain tcp/udp it's also possible to independent sockets
> per port. There's no single userspace daemon that listens on
> 'tcp' and will dispatch into different process base on the port.
>
> And with QUIC the port space is the ALPN and/or SNI
> combination.
>
> And I think this should be addressed before this becomes an
> unchangeable kernel ABI, written is stone.
>
> >>>>> So you expect (k)smbd server and web server both to listen on UDP
> >>>>> port 443 on the same host, and which APP server accepts the request
> >>>>> from a client depends on ALPN, right?
> >>>>
> >>>> yes.
> >>> Got you. This can be done by also moving TLS 1.3 message exchange to
> >>> kernel where we can get the ALPN before looking up the listening socket.
> >>> However, In-kernel TLS 1.3 Handshake had been NACKed by both kernel
> >>> netdev maintainers and userland ssl lib developers with good reasons.
> >>>
> >>>>
> >>>>> Currently, in Kernel, this implementation doesn't process any raw TLS
> >>>>> MSG/EXTs but deliver them to userspace after decryption, and the accept
> >>>>> socket is created before processing handshake.
> >>>>>
> >>>>> I'm actually curious how userland QUIC handles this, considering
> >>>>> that the UDP sockets('listening' on the same IP:PORT) are used in
> >>>>> two different servers' processes. I think socket lookup with ALPN
> >>>>> has to be done in Kernel Space. Do you know any userland QUIC
> >>>>> implementation for this?
> >>>>
> >>>> I don't now, but I guess QUIC is only used for http so
> >>>> far and maybe dns, but that seems to use port 853.
> >>>>
> >>>> So there's no strict need for it and the web server
> >>>> would handle all relevant ALPNs.
> >>> Honestly, I don't think any userland QUIC can use ALPN to lookup for
> >>> different sockets used by different servers/processes. As such thing
> >>> can be only done in Kernel Space.
> >>>
> >>>>
> >>>>>>
> >>>>>> So the server application should have a way to specify the desired
> >>>>>> ALPN before or during the bind() call. I'm not sure if the
> >>>>>> ALPN is available in cleartext before any crypto is needed,
> >>>>>> so if the ALPN is encrypted it might be needed to also register
> >>>>>> a server certificate and key together with the ALPN.
> >>>>>> Because multiple application may not want to share the same key.
> >>>>> On send side, ALPN extension is in raw TLS messages created in userspace
> >>>>> and passed into the kernel and encoded into QUIC crypto frame and then
> >>>>> *encrypted* before sending out.
> >>>>
> >>>> Ok.
> >>>>
> >>>>> On recv side, after decryption, the raw TLS messages are decoded from
> >>>>> the QUIC crypto frame and then delivered to userspace, so in userspace
> >>>>> it processes certificate validation and also see cleartext ALPN.
> >>>>>
> >>>>> Let me know if I don't make it clear.
> >>>>
> >>>> But the first "new" QUIC pdu from will trigger the accept() to
> >>>> return and userspace (or the kernel helper function) will to
> >>>> all crypto? Or does the first decryption happen in kernel (before accept returns)?
> >>> Good question!
> >>>
> >>> The first "new" QUIC pdu will cause to create a 'request sock' (contains
> >>> 4-tuple and connection IDs only) and queue up to reqsk list of the listen
> >>> sock (if validate_peer_address param is not set), and this pdu is enqueued
> >>> in the inq->backlog_list of the listen sock.
> >>>
> >>> When accept() is called, in Kernel, it dequeues the "request sock" from the
> >>> reqsk list of the listen sock, and creates the accept socket based on this
> >>> reqsk. Then it processes the pdu for this new accept socket from the
> >>> inq->backlog_list of the listen sock, including *decrypting* QUIC packet
> >>> and decoding CRYPTO frame, then deliver the raw/cleartext TLS message to
> >>> the Userspace libquic.
> >>
> >> Ok, when the kernel already decrypts it could already
> >> look find the ALPN. It doesn't mean it should do the full
> >> handshake, but parse enough to find the ALPN.
> > Correct, in-kernel QUIC should only do the QUIC related things,
> > and all TLS handshake msgs must be handled in Userspace.
> > This won't cause "layering violation", as Nick Banks said.
>
> But I think its unavoidable for the ALPN and SNI fields on
> the server side. As every service tries to use udp port 443
> and somehow that needs to be shared if multiple services want to
> use it.
>
> I guess on the acceptor side we would need to somehow detach low level
> udp struct sock from the logical listen struct sock.
>
> And quic_do_listen_rcv() would need to find the correct logical listening
> socket and call quic_request_sock_enqueue() on the logical socket
> not the lowlevel udo socket. The same for all stuff happening after
> quic_request_sock_enqueue() at the end of quic_do_listen_rcv.
>
The implementation allows one low level UDP sock to serve for multiple
QUIC socks.
Currently, if your 3 quic applications listen to the same address:port
with SO_REUSEPORT socket option set, the incoming connection will choose
one of your applications randomly with hash(client_addr+port) via
reuseport_select_sock() in quic_sock_lookup().
It should be easy to do a further match with ALPN between these 3 quic
socks that listens to the same address:port to get the right quic sock,
instead of that randomly choosing.
The problem is to parse the TLS Client_Hello message to get the ALPN in
quic_sock_lookup(), which is not a proper thing to do in kernel, and
might be rejected by networking maintainers, I need to check with them.
Will you be able to work around this by using Unix Domain Sockets pass
the sockfd to another process?
(Note that we're assuming all your 3 applications are using in-kernel QUIC)
> >> But I don't yet understand how the kernel gets the key to
> >> do the initlal decryption, I'd assume some call before listen()
> >> need to tell the kernel about the keys.
> > For initlal decryption, the keys can be derived with the initial packet.
> > basically, it only needs the dst_connection_id from the client initial
> > packet. see:
> >
> > https://datatracker.ietf.org/doc/html/rfc9001#name-initial-secrets
> >
> > so we don't need to set up anything to kernel for initial's keys.
>
> I got it thanks!
>
> metze
>
Powered by blists - more mailing lists