linux-kernel - Re: [PATCH v4] netdev attribute to control xdpgeneric skb linearization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAMOZA0+T3k25ndRKpSwDZ9vHkMaJUz4XhtfGFGNn=sPrGoSQ4Q@mail.gmail.com>
Date:   Wed, 4 Mar 2020 02:06:15 -0800
From:   Luigi Rizzo <lrizzo@...gle.com>
To:     Daniel Borkmann <daniel@...earbox.net>
Cc:     Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Network Development <netdev@...r.kernel.org>,
        Toke Høiland-Jørgensen <toke@...hat.com>,
        David Miller <davem@...emloft.net>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        "Jubran, Samih" <sameehj@...zon.com>,
        linux-kernel <linux-kernel@...r.kernel.org>, ast@...nel.org,
        bpf@...r.kernel.org
Subject: Re: [PATCH v4] netdev attribute to control xdpgeneric skb linearization

[taking one message in the thread to answer multiple issues]

On Tue, Mar 3, 2020 at 11:47 AM Daniel Borkmann <daniel@...earbox.net> wrote:
>
> On 2/29/20 12:53 AM, Willem de Bruijn wrote:
> > On Fri, Feb 28, 2020 at 2:01 PM Jakub Kicinski <kuba@...nel.org> wrote:
> >> On Fri, 28 Feb 2020 02:54:35 -0800 Luigi Rizzo wrote:
> >>> Add a netdevice flag to control skb linearization in generic xdp mode.
> >>>
> >>> The attribute can be modified through
> >>>        /sys/class/net/<DEVICE>/xdpgeneric_linearize
> >>> The default is 1 (on)
...
> >>> ns/pkt                   RECEIVER                 SENDER
> >>>
> >>>                      p50     p90     p99       p50   p90    p99
> >>>
> >>> LINEARIZATION:    600ns  1090ns  4900ns     149ns 249ns  460ns
> >>> NO LINEARIZATION:  40ns    59ns    90ns      40ns  50ns  100ns
...
> >> Just load your program in cls_bpf. No extensions or knobs needed.

Yes this is indeed an option, perhaps the only downside is that
it acts after packet taps, so if, say, the program is there to filter unwanted
traffic we would miss that protection.

...
> >> Making xdpgeneric-only extensions without touching native XDP makes
> >> no sense to me. Is this part of some greater vision?
> >
> > Yes, native xdp has the same issue when handling packets that exceed a
> > page (4K+ MTU) or otherwise consist of multiple segments. The issue is
> > just more acute in generic xdp. But agreed that both need to be solved
> > together.
> >
> > Many programs need only access to the header. There currently is not a
> > way to express this, or for xdp to convey that the buffer covers only
> > part of the packet.
>
> Right, my only question I had earlier was that when users ship their
> application with /sys/class/net/<DEVICE>/xdpgeneric_linearize turned off,
> how would they know how much of the data is actually pulled in? Afaik,

The short answer is that before turning linearization off, the sysadmin should
make sure that the linear section contains enough data for the program
to operate.
In doubt, leave linearization on and live with the cost.

The long answer (which probably repeats things I already discussed
with some of you):
clearly this patch is not perfect, as it lacks ways for the kernel and
bpf program to
communicate
a) whether there is a non-linear section, and
b) whether the bpf program understands non-linear/partial packets and how much
linear data (and headroom) it expects.

Adding these two features needs some agreement on the details.
We had a thread a few weeks ago about multi-segment xdp support, I am not sure
we reached a conclusion, and I am concerned that we may end up reimplementing
sg lists or simplified-skbs for use in bpf programs where perhaps we
could just live
with pull_up/accessor for occasional access to the non-linear part,
and some hints
that the program can pass to the driver/xdpgeneric to specify
requirements. for #b

Specifically:
#a is trivial -- add a field to the xdp_buff, and a helper to read it
from the bpf program;
#b is a bit less clear -- it involves a helper to either pull_up or
access the non linear data
(which one is preferable probably depends on the use case and we may want both),
and some attribute that the program passes to the kernel at load time,
to control
when linearization should be applied. I have hacked the 'license'
section to pass this
information on a per-program basis, but we need a cleaner way.

My reasoning for suggesting this patch, as an interim solution, is that
being completely opt-in, one can carefully evaluate when it is safe to use
even without having #b implemented.
For #a, the program might infer (but not reliably) that some data are
missing by looking
at the payload length which may be present in some of the headers. We
could mitigate
abuse by e.g. forcing XDP_REDIRECT and XDP_TX in xdpgeneric only
accept linear packets.

cheers
luigi

> some drivers might only have a linear section that covers the eth header
> and that is it. What should the BPF prog do in such case? Drop the skb
> since it does not have the rest of the data to e.g. make a XDP_PASS
> decision or fallback to tc/BPF altogether? I hinted earlier, one way to
> make this more graceful is to add a skb pointer inside e.g. struct
> xdp_rxq_info and then enable an bpf_skb_pull_data()-like helper e.g. as:
>
> BPF_CALL_2(bpf_xdp_pull_data, struct xdp_buff *, xdp, u32, len)
> {
>          struct sk_buff *skb = xdp->rxq->skb;
>
>          return skb ? bpf_try_make_writable(skb, len ? :
>                                             skb_headlen(skb)) : -ENOTSUPP;
> }
>
> Thus, when the data/data_end test fails in generic XDP, the user can
> call e.g. bpf_xdp_pull_data(xdp, 64) to make sure we pull in as much as
> is needed w/o full linearization and once done the data/data_end can be
> repeated to proceed. Native XDP will leave xdp->rxq->skb as NULL, but
> later we could perhaps reuse the same bpf_xdp_pull_data() helper for
> native with skb-less backing. Thoughts?
>
> Thanks,
> Daniel