netdev - Re: RFC: Fixing SK_REUSEPORT from sk_lookup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190516203325.uhg7c5sr45od7lzm@ast-mbp>
Date:   Thu, 16 May 2019 13:33:27 -0700
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Lorenz Bauer <lmb@...udflare.com>
Cc:     Joe Stringer <joe@...valent.com>,
        Networking <netdev@...r.kernel.org>, bpf@...r.kernel.org,
        kafai@...com, daniel@...earbox.net, edumazet@...gle.com
Subject: Re: RFC: Fixing SK_REUSEPORT from sk_lookup_* helpers

On Thu, May 16, 2019 at 09:41:34AM +0100, Lorenz Bauer wrote:
> On Wed, 15 May 2019 at 18:16, Joe Stringer <joe@...valent.com> wrote:
> >
> > On Wed, May 15, 2019 at 8:11 AM Lorenz Bauer <lmb@...udflare.com> wrote:
> > >
> > > In the BPF-based TPROXY session with Joe Stringer [1], I mentioned
> > > that the sk_lookup_* helpers currently return inconsistent results if
> > > SK_REUSEPORT programs are in play.
> > >
> > > SK_REUSEPORT programs are a hook point in inet_lookup. They get access
> > > to the full packet
> > > that triggered the look up. To support this, inet_lookup gained a new
> > > skb argument to provide such context. If skb is NULL, the SK_REUSEPORT
> > > program is skipped and instead the socket is selected by its hash.
> > >
> > > The first problem is that not all callers to inet_lookup from BPF have
> > > an skb, e.g. XDP. This means that a look up from XDP gives an
> > > incorrect result. For now that is not a huge problem. However, once we
> > > get sk_assign as proposed by Joe, we can end up circumventing
> > > SK_REUSEPORT.
> >
> > To clarify a bit, the reason this is a problem is that a
> > straightforward implementation may just consider passing the skb
> > context into the sk_lookup_*() and through to the inet_lookup() so
> > that it would run the SK_REUSEPORT BPF program for socket selection on
> > the skb when the packet-path BPF program performs the socket lookup.
> > However, as this paragraph describes, the skb context is not always
> > available.
> >
> > > At the conference, someone suggested using a similar approach to the
> > > work done on the flow dissector by Stanislav: create a dedicated
> > > context sk_reuseport which can either take an skb or a plain pointer.
> > > Patch up load_bytes to deal with both. Pass the context to
> > > inet_lookup.
> > >
> > > This is when we hit the second problem: using the skb or XDP context
> > > directly is incorrect, because it assumes that the relevant protocol
> > > headers are at the start of the buffer. In our use case, the correct
> > > headers are at an offset since we're inspecting encapsulated packets.
> > >
> > > The best solution I've come up with is to steal 17 bits from the flags
> > > argument to sk_lookup_*, 1 bit for BPF_F_HEADERS_AT_OFFSET, 16bit for
> > > the offset itself.
> >
> > FYI there's also the upper 32 bits of the netns_id parameter, another
> > option would be to steal 16 bits from there.
> 
> Or len, which is only 16 bits realistically. The offset doesn't really fit into
> either of them very well, using flags seemed the cleanest to me.
> Is there some best practice around this?
> 
> >
> > > Thoughts?
> >
> > Internally with skbs, we use `skb_pull()` to manage header offsets,
> > could we do something similar with `bpf_xdp_adjust_head()` prior to
> > the call to `bpf_sk_lookup_*()`?
> 
> That would only work if it retained the contents of the skipped
> buffer, and if there
> was a way to undo the adjustment later. We're doing the sk_lookup to
> decide whether to
> accept or forward the packet, so at the point of the call we might still need
> that data. Is that feasible with skb / XDP ctx?

While discussing the solution for reuseport I propose to use
progs/test_select_reuseport_kern.c as an example of realistic program.
It reads tcp/udp header directly via ctx->data or via bpf_skb_load_bytes()
including payload after the header.
It also uses bpf_skb_load_bytes_relative() to fetch IP.
I think if we're fixing the sk_lookup from XDP the above program
would need to work.

And I think we can make it work by adding new requirement that
'struct bpf_sock_tuple *' argument to bpf_sk_lookup_* must be
a pointer to the packet and not a pointer to bpf program stack.
Then helper can construct a fake skb and assign
fake_skb->data = &bpf_sock_tuple_arg.sport
It can check that struct bpf_sock_tuple * pointer is within 100-ish bytes
from xdp->data and within xdp->data_end
This way the reuseport program's assumption that ctx->data points to tcp/udp
will be preserved and it can access it all including payload.

This approach doesn't need to mess with xdp_adjust_head and adjust uapi to pass length.
Existing progs/test_sk_lookup_kern.c will magically start working with XDP
even when reuseport prog is attached.
Thoughts?