netdev - Re: Sockmap's parser/verdict programs and epoll notifications

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <651cfadbe3308_314bc2083f@john.notmuch>
Date: Tue, 03 Oct 2023 22:40:43 -0700
From: John Fastabend <john.fastabend@...il.com>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>, 
 Jakub Kicinski <kuba@...nel.org>
Cc: John Fastabend <john.fastabend@...il.com>, 
 bpf <bpf@...r.kernel.org>, 
 Networking <netdev@...r.kernel.org>, 
 "davidhwei@...a.com" <davidhwei@...a.com>
Subject: Re: Sockmap's parser/verdict programs and epoll notifications

Andrii Nakryiko wrote:
> On Tue, Oct 3, 2023 at 5:42 AM Jakub Kicinski <kuba@...nel.org> wrote:
> >
> > On Mon, 02 Oct 2023 22:16:13 -0700 John Fastabend wrote:
> > > > This with the other piece we want from our side to allow running
> > > > verdict and sk_msg programs on sockets without having them in a
> > > > sockmap/sockhash it would seem like a better system to me. The
> > > > idea to drop the sockmap/sockhash is because we never remove progs
> > > > once they are added and we add them from sockops side. The filter
> > > > to socketes is almost always the port + metadata related to the
> > > > process or environment. This simplifies having to manage the
> > > > sockmap/sockhash and guess what size it should be. Sometimes we
> > > > overrun these maps and have to kill connections until we can
> > > > get more space.
> >
> > That's a step in the right direction for sure, but I still think that
> > Google's auto-lowat is the best approach. We just need a hook that
> > looks at incoming data and sets rcvlowat appropriately. That's it.
> > TCP looks at rcvlowat in a number of places to make protocol decisions,
> > not just the wake-up. Plus Google will no longer have to carry their
> > OOT patch..
> 
> David can correct me, but when he tried the SO_RCVLOWAT approach to
> solving this problem, he saw no improvements (and it might have
> actually been a regression in terms of behavior). I'd say that this
> sounds a bit suspicious and we have plans to get back to SO_RCVLOWAT
> and try to understand the behavior a bit better.

Not sure how large your packets are but you might need to bump your
sk_rcvbuf size as well otherwise even if you set SO_RCVLOWAT you can
hit memory pressure which will wake up the application regardless
iirc.

> 
> I'll just say that the simpler the solution - the better. And if this
> rcvlowat hook gets us the ability to delay network notification to
> user-space until a full logical packet (where packet size is provided
> by BPF program without user space involvement) is assembled (up to
> some reasonable limits, of course), that would be great.

When we created the sockmap/sockhash maps and verdict progs, etc. one
of the goals was to avoid touching the TCP code paths as much as
possible. We also wanted to work on top of KTLS. Maybe you wouldn't
need it, but if you need to read a header across multiple skbs that
is hard without something to reconstruct them. Perhaps here you
could get away without needing this though.

I'll still fix the parser program and start working on simplifying
the verdict programs so they can run without maps and so on because
it helps other use cases. Maybe it will end up working for this
case or you find a simpler mechanism.