netdev - Re: [bpf-next, v2 1/3] flow_dissector: implements flow dissector BPF hook

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180912222500.57wnc7sjofgatuxu@ast-mbp>
Date:   Wed, 12 Sep 2018 15:25:01 -0700
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc:     Petar Penkov <peterpenkov96@...il.com>,
        Network Development <netdev@...r.kernel.org>,
        David Miller <davem@...emloft.net>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        simon.horman@...ronome.com, ecree@...arflare.com,
        songliubraving@...com, Tom Herbert <tom@...bertland.com>,
        Petar Penkov <ppenkov@...gle.com>,
        Willem de Bruijn <willemb@...gle.com>
Subject: Re: [bpf-next, v2 1/3] flow_dissector: implements flow dissector BPF
 hook

On Wed, Sep 12, 2018 at 02:43:37PM -0400, Willem de Bruijn wrote:
> On Tue, Sep 11, 2018 at 11:47 PM Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> >
> > On Fri, Sep 07, 2018 at 05:11:08PM -0700, Petar Penkov wrote:
> > > From: Petar Penkov <ppenkov@...gle.com>
> > >
> > > Adds a hook for programs of type BPF_PROG_TYPE_FLOW_DISSECTOR and
> > > attach type BPF_FLOW_DISSECTOR that is executed in the flow dissector
> > > path. The BPF program is per-network namespace.
> > >
> > > Signed-off-by: Petar Penkov <ppenkov@...gle.com>
> > > Signed-off-by: Willem de Bruijn <willemb@...gle.com>
> > > ---
> > >  include/linux/bpf.h            |   1 +
> > >  include/linux/bpf_types.h      |   1 +
> > >  include/linux/skbuff.h         |   7 ++
> > >  include/net/net_namespace.h    |   3 +
> > >  include/net/sch_generic.h      |  12 ++-
> > >  include/uapi/linux/bpf.h       |  25 ++++++
> > >  kernel/bpf/syscall.c           |   8 ++
> > >  kernel/bpf/verifier.c          |  32 ++++++++
> > >  net/core/filter.c              |  67 ++++++++++++++++
> > >  net/core/flow_dissector.c      | 136 +++++++++++++++++++++++++++++++++
> > >  tools/bpf/bpftool/prog.c       |   1 +
> > >  tools/include/uapi/linux/bpf.h |  25 ++++++
> > >  tools/lib/bpf/libbpf.c         |   2 +
> >
> > please split up update to tools/include/uapi/linux/bpf.h as a separate patch 2.
> > We often have conflicts in there, so best to have a separate.
> > Also please split tools/lib and tools/bpf chnages into patch 3.
> 
> Will do in v3.
> 
> > >  struct qdisc_skb_cb {
> > > -     unsigned int            pkt_len;
> > > -     u16                     slave_dev_queue_mapping;
> > > -     u16                     tc_classid;
> > > +     union {
> > > +             struct {
> > > +                     unsigned int            pkt_len;
> > > +                     u16                     slave_dev_queue_mapping;
> > > +                     u16                     tc_classid;
> > > +             };
> > > +             struct bpf_flow_keys *flow_keys;
> > > +     };
> >
> > is this magic really necessary? flow_dissector runs very early in recv path.
> > There is no qdisc or conflicts with tcp/ip use of cb.
> > I think the whole cb block can be used.
> 
> The flow dissector also runs in the context of TC, from flower.
> But that is not a reason to use this struct.
> 
> We need both (a) data shared with the BPF application and between
> applications using tail-calls, to pass along the parse offset (nhoff),
> and (b) data not accessible by the program, to store the flow_keys
> pointer.
> 
> qdisc_skb_cb already has this split, exposing only the 20B .data
> field to the application. Flow dissection currently reuses the existing
> bpf_convert_ctx_access logic for this field.
> 
> We could create a separate flowdissect_skb_cb struct with the
> same split setup, but a second constraint is that relevant internal
> BPF interfaces already expect qdisc_skb_cb, notably
> bkf_skb_data_end. So the union was easier.

got it. all makes sense.

> 
> There is another way to pass nhoff besides cb[] (see below). But
> I don't immediately see another place to store the flow_keys ptr.
> 
> At least, if we pass skb as context. One larger change would
> be to introduce another ctx struct, similar to sk_reuseport_(kern|md).

yeah. thought about this too, but your approach looks easier and faster.
Accesses to skb have one less dereference.

> > > @@ -2333,6 +2335,7 @@ struct __sk_buff {
> > >       /* ... here. */
> > >
> > >       __u32 data_meta;
> > > +     __u32 flow_keys;
> >
> > please use
> > struct bpf_flow_keys *flow_keys;
> > instead.
> >
> > See what we did in 'struct sk_msg_md' and in 'struct sk_reuseport_md'.
> > There is no need to hide pointers in u32.
> >
> 
> Will do in v3.
> 
> > > @@ -658,6 +754,46 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
> > >                                             FLOW_DISSECTOR_KEY_BASIC,
> > >                                             target_container);
> > >
> > > +     rcu_read_lock();
> > > +     attached = skb ? rcu_dereference(dev_net(skb->dev)->flow_dissector_prog)
> > > +                    : NULL;
> > > +     if (attached) {
> > > +             /* Note that even though the const qualifier is discarded
> > > +              * throughout the execution of the BPF program, all changes(the
> > > +              * control block) are reverted after the BPF program returns.
> > > +              * Therefore, __skb_flow_dissect does not alter the skb.
> > > +              */
> > > +             struct bpf_flow_keys flow_keys = {};
> > > +             struct qdisc_skb_cb cb_saved;
> > > +             struct qdisc_skb_cb *cb;
> > > +             u16 *pseudo_cb;
> > > +             u32 result;
> > > +
> > > +             cb = qdisc_skb_cb(skb);
> > > +             pseudo_cb = (u16 *)bpf_skb_cb((struct sk_buff *)skb);
> > > +
> > > +             /* Save Control Block */
> > > +             memcpy(&cb_saved, cb, sizeof(cb_saved));
> > > +             memset(cb, 0, sizeof(cb_saved));
> > > +
> > > +             /* Pass parameters to the BPF program */
> > > +             cb->flow_keys = &flow_keys;
> > > +             *pseudo_cb = nhoff;
> >
> > I don't understand this bit.
> > What is this pseudo_cb and why nhoff goes in there?
> > Some odd way to pass it into the prog?
> 
> Yes. nhoff passes the offset to the program to start parsing from.
> Applications also pass this during tail calls.
> 
> Alternatively we can just add a new field to struct bpf_flow_keys.

I think that certainly will be cleaner and easier to use from
bpf prog pov. Since flow_keys stay constant any change to nhoff
between tail_calls will be preserved too. I see no cons to such approach.