lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 19 Apr 2024 16:07:50 +0200
From: Stefan Metzmacher <metze@...ba.org>
To: Xin Long <lucien.xin@...il.com>
Cc: network dev <netdev@...r.kernel.org>, davem@...emloft.net,
 kuba@...nel.org, Eric Dumazet <edumazet@...gle.com>,
 Paolo Abeni <pabeni@...hat.com>, Steve French <smfrench@...il.com>,
 Namjae Jeon <linkinjeon@...nel.org>, Chuck Lever III
 <chuck.lever@...cle.com>, Jeff Layton <jlayton@...nel.org>,
 Sabrina Dubroca <sd@...asysnail.net>, Tyler Fanelli <tfanelli@...hat.com>,
 Pengtao He <hepengtao@...omi.com>,
 "linux-cifs@...r.kernel.org" <linux-cifs@...r.kernel.org>,
 Samba Technical <samba-technical@...ts.samba.org>
Subject: Re: [RFC PATCH net-next 0/5] net: In-kernel QUIC implementation with
 Userspace handshake

Hi Xin Long,

>>>>>> first many thanks for working on this topic!
>>>>>>
>>>>> Hi, Stefan
>>>>>
>>>>> Thanks for the comment!
>>>>>
>>>>>>> Usage
>>>>>>> =====
>>>>>>>
>>>>>>> This implementation supports a mapping of QUIC into sockets APIs. Similar
>>>>>>> to TCP and SCTP, a typical Server and Client use the following system call
>>>>>>> sequence to communicate:
>>>>>>>
>>>>>>>            Client                    Server
>>>>>>>         ------------------------------------------------------------------
>>>>>>>         sockfd = socket(IPPROTO_QUIC)      listenfd = socket(IPPROTO_QUIC)
>>>>>>>         bind(sockfd)                       bind(listenfd)
>>>>>>>                                            listen(listenfd)
>>>>>>>         connect(sockfd)
>>>>>>>         quic_client_handshake(sockfd)
>>>>>>>                                            sockfd = accecpt(listenfd)
>>>>>>>                                            quic_server_handshake(sockfd, cert)
>>>>>>>
>>>>>>>         sendmsg(sockfd)                    recvmsg(sockfd)
>>>>>>>         close(sockfd)                      close(sockfd)
>>>>>>>                                            close(listenfd)
>>>>>>>
>>>>>>> Please note that quic_client_handshake() and quic_server_handshake() functions
>>>>>>> are currently sourced from libquic in the github lxin/quic repository, and might
>>>>>>> be integrated into ktls-utils in the future. These functions are responsible for
>>>>>>> receiving and processing the raw TLS handshake messages until the completion of
>>>>>>> the handshake process.
>>>>>>
>>>>>> I see a problem with this design for the server, as one reason to
>>>>>> have SMB over QUIC is to use udp port 443 in order to get through
>>>>>> firewalls. As QUIC has the concept of ALPN it should be possible
>>>>>> let a conumer only listen on a specif ALPN, so that the smb server
>>>>>> and web server on "h3" could both accept connections.
>>>>> We do provide a sockopt to set ALPN before bind or handshaking:
>>>>>
>>>>>      https://github.com/lxin/quic/wiki/man#quic_sockopt_alpn
>>>>>
>>>>> But it's used more like to verify if the ALPN set on the server
>>>>> matches the one received from the client, instead of to find
>>>>> the correct server.
>>>>
>>>> Ah, ok.
>>> Just note that, with a bit change in the current libquic, it still
>>> allows users to use ALPN to find the correct function or thread in
>>> the *same* process, usage be like:
>>>
>>> listenfd = socket(IPPROTO_QUIC);
>>> /* match all during handshake with wildcard ALPN */
>>> setsockopt(listenfd, QUIC_SOCKOPT_ALPN, "*");
>>> bind(listenfd)
>>> listen(listenfd)
>>>
>>> while (1) {
>>>     sockfd = accept(listenfd);
>>>     /* the alpn from client will be set to sockfd during handshake */
>>>     quic_server_handshake(sockfd, cert);
>>>
>>>     getsockopt(sockfd, QUIC_SOCKOPT_ALPN, alpn);
>>
>> Would quic_server_handshake() call setsockopt()?
> Yes, I just made a bit change in the userspace libquic:
> 
>    https://github.com/lxin/quic/commit/9c75bd42769a8cbc1652e2f4c8d77780f23afde6
> 
> So you can set up multple ALPNs on listen sock:
> 
>    setsockopt(listenfd, QUIC_SOCKOPT_ALPN, "smbd, h3, ksmbd");
> 
> Then during handshake, the matched ALPN from client will be set into
> the accept socket, then users can get it later after handshake.
> 
> Note that userspace libquic is a very light lib (a couple of hundred lines
> of code), you can add more TLS related support without touching Kernel code,
> including the SNI support you mentioned.
> 
>>
>>>     switch (alpn) {
>>>       case "smbd": smbd_thread(sockfd);
>>>       case "h3": h3_thread(sockfd);
>>>       case "ksmbd": ksmbd_thread(sockfd);
>>>     }
>>> }
>>
>> Ok, but that would mean all application need to be aware of each other,
>> but it would be possible and socket fds could be passed to other
>> processes.
> It doesn't sound common to me, but yes, I think Unix Domain Sockets
> can pass it to another process.

I think it will be extremely common to have multiple services
based on udp port 443.

People will expect to find web services, smb and maybe more
behind the same dnshost name. And multiple dnshostnames pointing
to the same ip address is also very likely.

With plain tcp/udp it's also possible to independent sockets
per port. There's no single userspace daemon that listens on
'tcp' and will dispatch into different process base on the port.

And with QUIC the port space is the ALPN and/or SNI
combination.

And I think this should be addressed before this becomes an
unchangeable kernel ABI, written is stone.

>>>>> So you expect (k)smbd server and web server both to listen on UDP
>>>>> port 443 on the same host, and which APP server accepts the request
>>>>> from a client depends on ALPN, right?
>>>>
>>>> yes.
>>> Got you. This can be done by also moving TLS 1.3 message exchange to
>>> kernel where we can get the ALPN before looking up the listening socket.
>>> However, In-kernel TLS 1.3 Handshake had been NACKed by both kernel
>>> netdev maintainers and userland ssl lib developers with good reasons.
>>>
>>>>
>>>>> Currently, in Kernel, this implementation doesn't process any raw TLS
>>>>> MSG/EXTs but deliver them to userspace after decryption, and the accept
>>>>> socket is created before processing handshake.
>>>>>
>>>>> I'm actually curious how userland QUIC handles this, considering
>>>>> that the UDP sockets('listening' on the same IP:PORT) are used in
>>>>> two different servers' processes. I think socket lookup with ALPN
>>>>> has to be done in Kernel Space. Do you know any userland QUIC
>>>>> implementation for this?
>>>>
>>>> I don't now, but I guess QUIC is only used for http so
>>>> far and maybe dns, but that seems to use port 853.
>>>>
>>>> So there's no strict need for it and the web server
>>>> would handle all relevant ALPNs.
>>> Honestly, I don't think any userland QUIC can use ALPN to lookup for
>>> different sockets used by different servers/processes. As such thing
>>> can be only done in Kernel Space.
>>>
>>>>
>>>>>>
>>>>>> So the server application should have a way to specify the desired
>>>>>> ALPN before or during the bind() call. I'm not sure if the
>>>>>> ALPN is available in cleartext before any crypto is needed,
>>>>>> so if the ALPN is encrypted it might be needed to also register
>>>>>> a server certificate and key together with the ALPN.
>>>>>> Because multiple application may not want to share the same key.
>>>>> On send side, ALPN extension is in raw TLS messages created in userspace
>>>>> and passed into the kernel and encoded into QUIC crypto frame and then
>>>>> *encrypted* before sending out.
>>>>
>>>> Ok.
>>>>
>>>>> On recv side, after decryption, the raw TLS messages are decoded from
>>>>> the QUIC crypto frame and then delivered to userspace, so in userspace
>>>>> it processes certificate validation and also see cleartext ALPN.
>>>>>
>>>>> Let me know if I don't make it clear.
>>>>
>>>> But the first "new" QUIC pdu from will trigger the accept() to
>>>> return and userspace (or the kernel helper function) will to
>>>> all crypto? Or does the first decryption happen in kernel (before accept returns)?
>>> Good question!
>>>
>>> The first "new" QUIC pdu will cause to create a 'request sock' (contains
>>> 4-tuple and connection IDs only) and queue up to reqsk list of the listen
>>> sock (if validate_peer_address param is not set), and this pdu is enqueued
>>> in the inq->backlog_list of the listen sock.
>>>
>>> When accept() is called, in Kernel, it dequeues the "request sock" from the
>>> reqsk list of the listen sock, and creates the accept socket based on this
>>> reqsk. Then it processes the pdu for this new accept socket from the
>>> inq->backlog_list of the listen sock, including *decrypting* QUIC packet
>>> and decoding CRYPTO frame, then deliver the raw/cleartext TLS message to
>>> the Userspace libquic.
>>
>> Ok, when the kernel already decrypts it could already
>> look find the ALPN. It doesn't mean it should do the full
>> handshake, but parse enough to find the ALPN.
> Correct, in-kernel QUIC should only do the QUIC related things,
> and all TLS handshake msgs must be handled in Userspace.
> This won't cause "layering violation", as Nick Banks said.

But I think its unavoidable for the ALPN and SNI fields on
the server side. As every service tries to use udp port 443
and somehow that needs to be shared if multiple services want to
use it.

I guess on the acceptor side we would need to somehow detach low level
udp struct sock from the logical listen struct sock.

And quic_do_listen_rcv() would need to find the correct logical listening
socket and call quic_request_sock_enqueue() on the logical socket
not the lowlevel udo socket. The same for all stuff happening after
quic_request_sock_enqueue() at the end of quic_do_listen_rcv.

>> But I don't yet understand how the kernel gets the key to
>> do the initlal decryption, I'd assume some call before listen()
>> need to tell the kernel about the keys.
> For initlal decryption, the keys can be derived with the initial packet.
> basically, it only needs the dst_connection_id from the client initial
> packet. see:
> 
>    https://datatracker.ietf.org/doc/html/rfc9001#name-initial-secrets
> 
> so we don't need to set up anything to kernel for initial's keys.

I got it thanks!

metze


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ