[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADvbK_e7i08GAiOenJNTP_m+-MeYjSf7J-vkF+hgRfYGNCjkwQ@mail.gmail.com>
Date: Mon, 22 Apr 2024 16:58:13 -0400
From: Xin Long <lucien.xin@...il.com>
To: Stefan Metzmacher <metze@...ba.org>, Martin KaFai Lau <kafai@...com>
Cc: network dev <netdev@...r.kernel.org>, davem@...emloft.net, kuba@...nel.org,
Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
Steve French <smfrench@...il.com>, Namjae Jeon <linkinjeon@...nel.org>,
Chuck Lever III <chuck.lever@...cle.com>, Jeff Layton <jlayton@...nel.org>,
Sabrina Dubroca <sd@...asysnail.net>, Tyler Fanelli <tfanelli@...hat.com>,
Pengtao He <hepengtao@...omi.com>,
"linux-cifs@...r.kernel.org" <linux-cifs@...r.kernel.org>,
Samba Technical <samba-technical@...ts.samba.org>
Subject: Re: [RFC PATCH net-next 0/5] net: In-kernel QUIC implementation with
Userspace handshake
On Sun, Apr 21, 2024 at 3:27 PM Stefan Metzmacher <metze@...ba.org> wrote:
>
> Am 20.04.24 um 21:32 schrieb Xin Long:
> > On Fri, Apr 19, 2024 at 3:19 PM Xin Long <lucien.xin@...il.com> wrote:
> >>
> >> On Fri, Apr 19, 2024 at 2:51 PM Stefan Metzmacher <metze@...ba.org> wrote:
> >>>
> >>> Hi Xin Long,
> >>>
> >>>>> But I think its unavoidable for the ALPN and SNI fields on
> >>>>> the server side. As every service tries to use udp port 443
> >>>>> and somehow that needs to be shared if multiple services want to
> >>>>> use it.
> >>>>>
> >>>>> I guess on the acceptor side we would need to somehow detach low level
> >>>>> udp struct sock from the logical listen struct sock.
> >>>>>
> >>>>> And quic_do_listen_rcv() would need to find the correct logical listening
> >>>>> socket and call quic_request_sock_enqueue() on the logical socket
> >>>>> not the lowlevel udo socket. The same for all stuff happening after
> >>>>> quic_request_sock_enqueue() at the end of quic_do_listen_rcv.
> >>>>>
> >>>> The implementation allows one low level UDP sock to serve for multiple
> >>>> QUIC socks.
> >>>>
> >>>> Currently, if your 3 quic applications listen to the same address:port
> >>>> with SO_REUSEPORT socket option set, the incoming connection will choose
> >>>> one of your applications randomly with hash(client_addr+port) vi
> >>>> reuseport_select_sock() in quic_sock_lookup().
> >>>>
> >>>> It should be easy to do a further match with ALPN between these 3 quic
> >>>> socks that listens to the same address:port to get the right quic sock,
> >>>> instead of that randomly choosing.
> >>>
> >>> Ah, that sounds good.
> >>>
> >>>> The problem is to parse the TLS Client_Hello message to get the ALPN in
> >>>> quic_sock_lookup(), which is not a proper thing to do in kernel, and
> >>>> might be rejected by networking maintainers, I need to check with them.
> >>>
> >>> Is the reassembling of CRYPTO frames done in the kernel or
> >>> userspace? Can you point me to the place in the code?
> >> In quic_inq_handshake_tail() in kernel, for Client Initial packet
> >> is processed when calling accept(), this is the path:
> >>
> >> quic_accept()-> quic_accept_sock_init() -> quic_packet_process() ->
> >> quic_packet_handshake_process() -> quic_frame_process() ->
> >> quic_frame_crypto_process() -> quic_inq_handshake_tail().
> >>
> >> Note that it's with the accept sock, not the listen sock.
> >>
> >>>
> >>> If it's really impossible to do in C code maybe
> >>> registering a bpf function in order to allow a listener
> >>> to check the intial quic packet and decide if it wants to serve
> >>> that connection would be possible as last resort?
> >> That's a smart idea! man.
> >> I think the bpf hook in reuseport_select_sock() is meant to do such
> >> selection.
> >>
> >> For the Client initial packet (the only packet you need to handle),
> >> I double you will need to do the reassembling, as Client Hello TLS message
> >> is always less than 400 byte in my env.
> >>
> >> But I think you need to do the decryption for the Client initial packet
> >> before decoding it then parsing the TLS message from its crypto frame.
> > I created this patch:
> >
> > https://github.com/lxin/quic/commit/aee0b7c77df3f39941f98bb901c73fdc560befb8
> >
> > to do this decryption in quic_sock_look() before calling
> > reuseport_select_sock(), so that it provides the bpf selector with
> > a plain-text QUIC initial packet:
> >
> > https://datatracker.ietf.org/doc/html/rfc9000#section-17.2.2
> >
> > If it's complex for you to do the decryption for the initial packet in
> > the bpf selector, I will apply this patch. Please let me know.
>
> I guess in addition to quic_server_handshake(), which is called
> after accept(), there should be quic_server_prepare_listen()
> (and something similar for in kernel servers) that setup the reuseport
> magic for the socket, so that it's not needed in every application.
It's done when calling listen(), see quic_inet_listen()->quic_hash()
where only listening sockets with its sk_reuseport set will be
added into the reuseport group.
It means SO_REUSEPORT sockopt must be set for every socket
before calling listen().
>
> It seems there is only a single ebpf program possible per
> reuseport group, so there has to be just a single one.
Yes, a single ebpf program per reuseport group should work.
see prepare_sk_fds() in kernel selftests for select_reuseport bfp.
>
> But is it possible for in kernel servers to also register an epbf program?
Good question. TBH, I don't really know much about epbf programming.
I guess the real problem is how you pass the .o file to kernel space?
Another question is, in the selftests:
tools/testing/selftests/bpf/prog_tests/select_reuseport.c
tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
it created a global reuseport_array, and then added these sockets
into this array for the later lookup, but these sockets are all created
in the same process.
But your case is that the sockets are created in different processes.
I'm not sure if it's possible to add sockets from different processes
into the same reuseport_array?
Added Martin who introduced BPF_PROG_TYPE_SK_REUSEPORT,
I guess he may know the answers.
Thanks.
Powered by blists - more mailing lists