[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <001001d55f10$7a6c96f0$6f45c4d0$@net>
Date: Fri, 30 Aug 2019 04:53:41 -0400
From: "Steve Zabele" <zabele@...cast.net>
To: "'Steve Zabele'" <zabele@...cast.net>,
"'Willem de Bruijn'" <willemdebruijn.kernel@...il.com>
Cc: "'Network Development'" <netdev@...r.kernel.org>,
<shum@...ndrew.org>, <vladimir116@...il.com>,
<saifi.khan@...ikr.in>, "'Daniel Borkmann'" <daniel@...earbox.net>,
<on2k16nm@...il.com>,
"'Stephen Hemminger'" <stephen@...workplumber.org>,
<mark.keaton@...theon.com>
Subject: RE: Is bug 200755 in anyone's queue??
Resending since the last send bounced with this error
The following recipient(s) cannot be reached:
'saifi.khan@...asynergy.org' on 8/30/2019 4:49 AM
550 5.1.1 <saifi.khan@...asynergy.org> recipient invalid domain
Sorry for the spam.
Steve
-----Original Message-----
From: Steve Zabele [mailto:zabele@...cast.net]
Sent: Friday, August 30, 2019 4:49 AM
To: 'Willem de Bruijn'
Cc: 'Network Development'; 'shum@...ndrew.org'; 'vladimir116@...il.com'; 'saifi.khan@...asynergy.org'; 'saifi.khan@...ikr.in'; 'Daniel Borkmann'; 'on2k16nm@...il.com'; 'Stephen Hemminger'; 'mark.keaton@...theon.com'
Subject: RE: Is bug 200755 in anyone's queue??
Hi Willem!
**Thank you** for the reply and the code segment, very much appreciated.
Can we expect that this will make its way into a near-term official release of the kernel? Our customers are really not up to patching and rebuilding kernels, plus it "taints" the kernel from a security perspective, and whenever there is a new release of the kernel (you come in one morning and your kernel has been magically upgraded for you because you forgot to disable auto updates) you need to rebuild and hope that the previous patch is still good for the new code, etc, etc.
Getting this onto the main branch as part of the official release cycle will be greatly appreciated!
Note that using an ebpf approach can't solve this problem (we know because we tried for quite a while to make it work, no luck). The key issue is that at the point when the ebpf filter gets the packet buffer reference it is pointing to the start of the UDP portion of the packet, and hence is not able to access the IP source address which is earlier in the buffer. Plus every time a new socket is opened or closed, a new epbf has to be created and inserted -- and there is really no good way to figure out which index is (now) associated with which file descriptor..
So thank you and the group for your attention to this.
With respect to your comment
>SO_REUSEPORT was not intended to be used in this way. Opening
>multiple connected sockets with the same local port.
I'd like to offer that there are a number of reliable transport protocols (alternatives to TCP) that use UDP. NORM (IETF RFC 5470) and Google's new QUIC protocol (https://www.ietf.org/blog/whats-happening-quic) are good examples.
Now consider that users of these protocols will want to create servers using these protocols -- a webserver is a good example. In fact Google has one running on QUIC, and many Chrome users don't even know they are using QUIC when they access Google webservers.
With a client-server model, clients contact the server at a well known server address and port. Upon first contact from a new client, the server opens another socket with the same local address and port and "connects" to the clients address and ephemeral port so that only traffic for the given five tuple arrives on the new file descriptor -- this allows the server application to keep concurrent sessions with different clients cleanly separated, even though all sessions use the same local server port. In fact, reusing the same port for different sessions is really important from a firewalling perspective,
This is pretty much what our application does, i.e., it uses different sockets/file descriptors to keep sessions straight.
And if it's worth anything, we have been using this mechanism with UDP for a *very* long time, the change in behavior appears to have happened with the 4.5 kernel.
So **thank you**!!
Steve
-----Original Message-----
From: Willem de Bruijn [mailto:willemdebruijn.kernel@...il.com]
Sent: Thursday, August 29, 2019 3:27 PM
To: Steve Zabele
Cc: Network Development; shum@...ndrew.org; vladimir116@...il.com; saifi.khan@...asynergy.org; saifi.khan@...ikr.in; Daniel Borkmann; on2k16nm@...il.com; Stephen Hemminger
Subject: Re: Is bug 200755 in anyone's queue??
On Fri, Aug 23, 2019 at 3:11 PM Steve Zabele <zabele@...cast.net> wrote:
>
> Hi folks,
>
> Is there a way to find out where the SO_REUSEPORT bug reported a year ago in
> August (and apparently has been a bug with kernels later than 4.4) is being
> addressed?
>
> The bug characteristics, simple standalone test code demonstrating the bug,
> and an assessment of the likely location/cause of the bug within the kernel
> are all described here
>
> https://bugzilla.kernel.org/show_bug.cgi?id=200755
>
> I'm really hoping this gets fixed so we can move forward on updating our
> kernels/Ubuntu release from our aging 4.4/16.04 release
>
> Thanks!
>
> Steve
>
>
>
> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen@...workplumber.org]
> Sent: Tuesday, July 16, 2019 10:03 AM
> To: Steve Zabele
> Cc: shum@...ndrew.org; vladimir116@...il.com; saifi.khan@...aSynergy.org;
> saifi.khan@...ikr.in; daniel@...earbox.net; on2k16nm@...il.com
> Subject: Re: Is bug 200755 in anyone's queue??
>
> On Tue, 16 Jul 2019 09:43:24 -0400
> "Steve Zabele" <zabele@...cast.net> wrote:
>
>
> > I came across bug report 200755 trying to figure out why some code I had
> > provided to customers a while ago no longer works with the current Linux
> > kernel. See
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=200755
> >
> > I've verified that, as reported, 'connect' no longer works for UDP.
> > Moreover, it appears it has been broken since the 4.5 kernel has been
> > released.
> >
> >
> >
> > It does also appear that the intended new feature of doing round robin
> > assignments to different UDP sockets opened with SO_REUSEPORT also does
> not
> > work as described.
> >
> >
> >
> > Since the original bug report was made nearly a year ago for the 4.14
> kernel
> > (and the bug is also still present in the 4.15 kernel) I'm curious if
> anyone
> > is on the hook to get this fixed any time soon.
> >
> >
> >
> > I'd rather not have to do my own demultiplexing using a single socket in
> > user space to work around what is clearly a (maybe not so recently
> > introduced) kernel bug if at all possible. My code had worked just fine on
> > 3.X kernels, and appears to work okay up through 4.4.
> >
>
> Kernel developers do not use bugzilla, I forward bug reports
> to netdev@...r.kernel.org (after filtering).
SO_REUSEPORT was not intended to be used in this way. Opening
multiple connected sockets with the same local port.
But since the interface allowed connect after joining a group, and
that is being used, I guess that point is moot. Still, I'm a bit
surprised that it ever worked as described.
Also note that the default distribution algorithm is not round robin
assignment, but hash based. So multiple consecutive datagrams arriving
at the same socket is not unexpected.
I suspect that this quick hack might "work". It seemed to on the
supplied .c file:
score = compute_score(sk, net, saddr, sport,
daddr, hnum, dif, sdif);
if (score > badness) {
- if (sk->sk_reuseport) {
+ if (sk->sk_reuseport && !sk->sk_state !=
TCP_ESTABLISHED) {
But a more robust approach, that also works on existing kernels, is to
swap the default distribution algorithm with a custom BPF based one (
SO_ATTACH_REUSEPORT_EBPF).
Powered by blists - more mailing lists