netdev - Re: XDP bpf_tail_call_redirect(): yea or nay?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87eervidr3.fsf@toke.dk>
Date:   Fri, 08 May 2020 00:25:36 +0200
From:   Toke Høiland-Jørgensen <toke@...hat.com>
To:     John Fastabend <john.fastabend@...il.com>,
        Björn Töpel 
        <bjorn.topel@...il.com>
Cc:     bpf <bpf@...r.kernel.org>, Netdev <netdev@...r.kernel.org>,
        Jesper Dangaard Brouer <brouer@...hat.com>
Subject: Re: XDP bpf_tail_call_redirect(): yea or nay?

John Fastabend <john.fastabend@...il.com> writes:

> Toke Høiland-Jørgensen wrote:
>> Björn Töpel <bjorn.topel@...il.com> writes:
>> 
>> > On Thu, 7 May 2020 at 15:44, Toke Høiland-Jørgensen <toke@...hat.com> wrote:
>> >>
>> >> Björn Töpel <bjorn.topel@...il.com> writes:
>> >>
>> >> > Before I start hacking on this, I might as well check with the XDP
>> >> > folks if this considered a crappy idea or not. :-)
>> >> >
>> >> > The XDP redirect flow for a packet is typical a dance of
>> >> > bpf_redirect_map() that updates the bpf_redirect_info structure with
>> >> > maps type/items, which is then followed by an xdp_do_redirect(). That
>> >> > function takes an action based on the bpf_redirect_info content.
>> >> >
>> >> > I'd like to get rid of the xdp_do_redirect() call, and the
>> >> > bpf_redirect_info (per-cpu) lookup. The idea is to introduce a new
>> >> > (oh-no!) XDP action, say, XDP_CONSUMED and a built-in helper with
>> >> > tail-call semantics.
>> >> >
>> >> > Something across the lines of:
>> >> >
>> >> > --8<--
>> >> >
>> >> > struct {
>> >> >         __uint(type, BPF_MAP_TYPE_XSKMAP);
>> >> >         __uint(max_entries, MAX_SOCKS);
>> >> >         __uint(key_size, sizeof(int));
>> >> >         __uint(value_size, sizeof(int));
>> >> > } xsks_map SEC(".maps");
>> >> >
>> >> > SEC("xdp1")
>> >> > int xdp_prog1(struct xdp_md *ctx)
>> >> > {
>> >> >         bpf_tail_call_redirect(ctx, &xsks_map, 0);
>> >> >         // Redirect the packet to an AF_XDP socket at entry 0 of the
>> >> >         // map.
>> >> >         //
>> >> >         // After a successful call, ctx is said to be
>> >> >         // consumed. XDP_CONSUMED will be returned by the program.
>> >> >         // Note that if the call is not successful, the buffer is
>> >> >         // still valid.
>> >> >         //
>> >> >         // XDP_CONSUMED in the driver means that the driver should not
>> >> >         // issue an xdp_do_direct() call, but only xdp_flush().
>> >> >         //
>> >> >         // The verifier need to be taught that XDP_CONSUMED can only
>> >> >         // be returned "indirectly", meaning a bpf_tail_call_XXX()
>> >> >         // call. An explicit "return XDP_CONSUMED" should be
>> >> >         // rejected. Can that be implemented?
>> >> >         return XDP_PASS; // or any other valid action.
>> >> > }
>
> I'm wondering if we can teach the verifier to recognize tail calls,
>
> int xdp_prog1(struct xdp_md *ctx)
> {
> 	return xdp_do_redirect(ctx, &xsks_map, 0);
> }
>
> This would be useful for normal calls as well. I guess the question here
> is would a tail call be sufficient for above case or do you need the
> 'return XDP_PASS' at the end? If so maybe we could fold it into the
> helper somehow.
>
> I think it would also address Toke's concerns, no new action so
> bpf developers can just develope like normal but "smart" developers
> will try do calls as tail calls.

This is certainly an interesting idea! Functional languages tend to
auto-optimise tail calls, so detecting them is certainly possible, at
least for the compiler. I suppose this could be a follow-on
optimisation, though, couldn't it? From the PoV of the surrounding code
(in the kernel), it doesn't really matter if the behaviour was signalled
by an explicit flag added in the code, or if this flag was automatically
added by the compiler.

> Not sure it can be done without driver changes though.

Well yeah, that's hard to know in the abstract, obviously. My point is
just that we should look very hard indeed before we decide it can't :)

-Toke