netdev - Re: [bpf-next, v2 1/3] flow_dissector: implements flow dissector BPF hook

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAF=yD-L3805_=+fiWcFrm=pi0soC05yzXZPOCk6aeKBBpubg=A@mail.gmail.com>
Date:   Wed, 12 Sep 2018 18:53:26 -0400
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     Petar Penkov <peterpenkov96@...il.com>,
        Network Development <netdev@...r.kernel.org>,
        David Miller <davem@...emloft.net>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        simon.horman@...ronome.com, ecree@...arflare.com,
        songliubraving@...com, Tom Herbert <tom@...bertland.com>,
        Petar Penkov <ppenkov@...gle.com>,
        Willem de Bruijn <willemb@...gle.com>
Subject: Re: [bpf-next, v2 1/3] flow_dissector: implements flow dissector BPF hook

On Wed, Sep 12, 2018 at 6:25 PM Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
>
> On Wed, Sep 12, 2018 at 02:43:37PM -0400, Willem de Bruijn wrote:
> > On Tue, Sep 11, 2018 at 11:47 PM Alexei Starovoitov
> > <alexei.starovoitov@...il.com> wrote:
> > >
> > > On Fri, Sep 07, 2018 at 05:11:08PM -0700, Petar Penkov wrote:
> > > > From: Petar Penkov <ppenkov@...gle.com>
> > > >
> > > > Adds a hook for programs of type BPF_PROG_TYPE_FLOW_DISSECTOR and
> > > > attach type BPF_FLOW_DISSECTOR that is executed in the flow dissector
> > > > path. The BPF program is per-network namespace.
> > > >
> > > > Signed-off-by: Petar Penkov <ppenkov@...gle.com>
> > > > Signed-off-by: Willem de Bruijn <willemb@...gle.com>
> > > > ---
> > > >  include/linux/bpf.h            |   1 +
> > > >  include/linux/bpf_types.h      |   1 +
> > > >  include/linux/skbuff.h         |   7 ++
> > > >  include/net/net_namespace.h    |   3 +
> > > >  include/net/sch_generic.h      |  12 ++-
> > > >  include/uapi/linux/bpf.h       |  25 ++++++
> > > >  kernel/bpf/syscall.c           |   8 ++
> > > >  kernel/bpf/verifier.c          |  32 ++++++++
> > > >  net/core/filter.c              |  67 ++++++++++++++++
> > > >  net/core/flow_dissector.c      | 136 +++++++++++++++++++++++++++++++++
> > > >  tools/bpf/bpftool/prog.c       |   1 +
> > > >  tools/include/uapi/linux/bpf.h |  25 ++++++
> > > >  tools/lib/bpf/libbpf.c         |   2 +
> > >
> > > please split up update to tools/include/uapi/linux/bpf.h as a separate patch 2.
> > > We often have conflicts in there, so best to have a separate.
> > > Also please split tools/lib and tools/bpf chnages into patch 3.
> >
> > Will do in v3.
> >
> > > >  struct qdisc_skb_cb {
> > > > -     unsigned int            pkt_len;
> > > > -     u16                     slave_dev_queue_mapping;
> > > > -     u16                     tc_classid;
> > > > +     union {
> > > > +             struct {
> > > > +                     unsigned int            pkt_len;
> > > > +                     u16                     slave_dev_queue_mapping;
> > > > +                     u16                     tc_classid;
> > > > +             };
> > > > +             struct bpf_flow_keys *flow_keys;
> > > > +     };
> > >
> > > is this magic really necessary? flow_dissector runs very early in recv path.
> > > There is no qdisc or conflicts with tcp/ip use of cb.
> > > I think the whole cb block can be used.
> >
> > The flow dissector also runs in the context of TC, from flower.
> > But that is not a reason to use this struct.
> >
> > We need both (a) data shared with the BPF application and between
> > applications using tail-calls, to pass along the parse offset (nhoff),
> > and (b) data not accessible by the program, to store the flow_keys
> > pointer.
> >
> > qdisc_skb_cb already has this split, exposing only the 20B .data
> > field to the application. Flow dissection currently reuses the existing
> > bpf_convert_ctx_access logic for this field.
> >
> > We could create a separate flowdissect_skb_cb struct with the
> > same split setup, but a second constraint is that relevant internal
> > BPF interfaces already expect qdisc_skb_cb, notably
> > bkf_skb_data_end. So the union was easier.
>
> got it. all makes sense.
>
> >
> > There is another way to pass nhoff besides cb[] (see below). But
> > I don't immediately see another place to store the flow_keys ptr.
> >
> > At least, if we pass skb as context. One larger change would
> > be to introduce another ctx struct, similar to sk_reuseport_(kern|md).
>
> yeah. thought about this too, but your approach looks easier and faster.
> Accesses to skb have one less dereference.
>
> > > > @@ -2333,6 +2335,7 @@ struct __sk_buff {
> > > >       /* ... here. */
> > > >
> > > >       __u32 data_meta;
> > > > +     __u32 flow_keys;
> > >
> > > please use
> > > struct bpf_flow_keys *flow_keys;
> > > instead.
> > >
> > > See what we did in 'struct sk_msg_md' and in 'struct sk_reuseport_md'.
> > > There is no need to hide pointers in u32.
> > >
> >
> > Will do in v3.
> >
> > > > @@ -658,6 +754,46 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
> > > >                                             FLOW_DISSECTOR_KEY_BASIC,
> > > >                                             target_container);
> > > >
> > > > +     rcu_read_lock();
> > > > +     attached = skb ? rcu_dereference(dev_net(skb->dev)->flow_dissector_prog)
> > > > +                    : NULL;
> > > > +     if (attached) {
> > > > +             /* Note that even though the const qualifier is discarded
> > > > +              * throughout the execution of the BPF program, all changes(the
> > > > +              * control block) are reverted after the BPF program returns.
> > > > +              * Therefore, __skb_flow_dissect does not alter the skb.
> > > > +              */
> > > > +             struct bpf_flow_keys flow_keys = {};
> > > > +             struct qdisc_skb_cb cb_saved;
> > > > +             struct qdisc_skb_cb *cb;
> > > > +             u16 *pseudo_cb;
> > > > +             u32 result;
> > > > +
> > > > +             cb = qdisc_skb_cb(skb);
> > > > +             pseudo_cb = (u16 *)bpf_skb_cb((struct sk_buff *)skb);
> > > > +
> > > > +             /* Save Control Block */
> > > > +             memcpy(&cb_saved, cb, sizeof(cb_saved));
> > > > +             memset(cb, 0, sizeof(cb_saved));
> > > > +
> > > > +             /* Pass parameters to the BPF program */
> > > > +             cb->flow_keys = &flow_keys;
> > > > +             *pseudo_cb = nhoff;
> > >
> > > I don't understand this bit.
> > > What is this pseudo_cb and why nhoff goes in there?
> > > Some odd way to pass it into the prog?
> >
> > Yes. nhoff passes the offset to the program to start parsing from.
> > Applications also pass this during tail calls.
> >
> > Alternatively we can just add a new field to struct bpf_flow_keys.
>
> I think that certainly will be cleaner and easier to use from
> bpf prog pov. Since flow_keys stay constant any change to nhoff
> between tail_calls will be preserved too. I see no cons to such approach.

Yes, it's definitely simpler. We'll do that.

Thanks!