[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180912222500.57wnc7sjofgatuxu@ast-mbp>
Date: Wed, 12 Sep 2018 15:25:01 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Petar Penkov <peterpenkov96@...il.com>,
Network Development <netdev@...r.kernel.org>,
David Miller <davem@...emloft.net>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
simon.horman@...ronome.com, ecree@...arflare.com,
songliubraving@...com, Tom Herbert <tom@...bertland.com>,
Petar Penkov <ppenkov@...gle.com>,
Willem de Bruijn <willemb@...gle.com>
Subject: Re: [bpf-next, v2 1/3] flow_dissector: implements flow dissector BPF
hook
On Wed, Sep 12, 2018 at 02:43:37PM -0400, Willem de Bruijn wrote:
> On Tue, Sep 11, 2018 at 11:47 PM Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> >
> > On Fri, Sep 07, 2018 at 05:11:08PM -0700, Petar Penkov wrote:
> > > From: Petar Penkov <ppenkov@...gle.com>
> > >
> > > Adds a hook for programs of type BPF_PROG_TYPE_FLOW_DISSECTOR and
> > > attach type BPF_FLOW_DISSECTOR that is executed in the flow dissector
> > > path. The BPF program is per-network namespace.
> > >
> > > Signed-off-by: Petar Penkov <ppenkov@...gle.com>
> > > Signed-off-by: Willem de Bruijn <willemb@...gle.com>
> > > ---
> > > include/linux/bpf.h | 1 +
> > > include/linux/bpf_types.h | 1 +
> > > include/linux/skbuff.h | 7 ++
> > > include/net/net_namespace.h | 3 +
> > > include/net/sch_generic.h | 12 ++-
> > > include/uapi/linux/bpf.h | 25 ++++++
> > > kernel/bpf/syscall.c | 8 ++
> > > kernel/bpf/verifier.c | 32 ++++++++
> > > net/core/filter.c | 67 ++++++++++++++++
> > > net/core/flow_dissector.c | 136 +++++++++++++++++++++++++++++++++
> > > tools/bpf/bpftool/prog.c | 1 +
> > > tools/include/uapi/linux/bpf.h | 25 ++++++
> > > tools/lib/bpf/libbpf.c | 2 +
> >
> > please split up update to tools/include/uapi/linux/bpf.h as a separate patch 2.
> > We often have conflicts in there, so best to have a separate.
> > Also please split tools/lib and tools/bpf chnages into patch 3.
>
> Will do in v3.
>
> > > struct qdisc_skb_cb {
> > > - unsigned int pkt_len;
> > > - u16 slave_dev_queue_mapping;
> > > - u16 tc_classid;
> > > + union {
> > > + struct {
> > > + unsigned int pkt_len;
> > > + u16 slave_dev_queue_mapping;
> > > + u16 tc_classid;
> > > + };
> > > + struct bpf_flow_keys *flow_keys;
> > > + };
> >
> > is this magic really necessary? flow_dissector runs very early in recv path.
> > There is no qdisc or conflicts with tcp/ip use of cb.
> > I think the whole cb block can be used.
>
> The flow dissector also runs in the context of TC, from flower.
> But that is not a reason to use this struct.
>
> We need both (a) data shared with the BPF application and between
> applications using tail-calls, to pass along the parse offset (nhoff),
> and (b) data not accessible by the program, to store the flow_keys
> pointer.
>
> qdisc_skb_cb already has this split, exposing only the 20B .data
> field to the application. Flow dissection currently reuses the existing
> bpf_convert_ctx_access logic for this field.
>
> We could create a separate flowdissect_skb_cb struct with the
> same split setup, but a second constraint is that relevant internal
> BPF interfaces already expect qdisc_skb_cb, notably
> bkf_skb_data_end. So the union was easier.
got it. all makes sense.
>
> There is another way to pass nhoff besides cb[] (see below). But
> I don't immediately see another place to store the flow_keys ptr.
>
> At least, if we pass skb as context. One larger change would
> be to introduce another ctx struct, similar to sk_reuseport_(kern|md).
yeah. thought about this too, but your approach looks easier and faster.
Accesses to skb have one less dereference.
> > > @@ -2333,6 +2335,7 @@ struct __sk_buff {
> > > /* ... here. */
> > >
> > > __u32 data_meta;
> > > + __u32 flow_keys;
> >
> > please use
> > struct bpf_flow_keys *flow_keys;
> > instead.
> >
> > See what we did in 'struct sk_msg_md' and in 'struct sk_reuseport_md'.
> > There is no need to hide pointers in u32.
> >
>
> Will do in v3.
>
> > > @@ -658,6 +754,46 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
> > > FLOW_DISSECTOR_KEY_BASIC,
> > > target_container);
> > >
> > > + rcu_read_lock();
> > > + attached = skb ? rcu_dereference(dev_net(skb->dev)->flow_dissector_prog)
> > > + : NULL;
> > > + if (attached) {
> > > + /* Note that even though the const qualifier is discarded
> > > + * throughout the execution of the BPF program, all changes(the
> > > + * control block) are reverted after the BPF program returns.
> > > + * Therefore, __skb_flow_dissect does not alter the skb.
> > > + */
> > > + struct bpf_flow_keys flow_keys = {};
> > > + struct qdisc_skb_cb cb_saved;
> > > + struct qdisc_skb_cb *cb;
> > > + u16 *pseudo_cb;
> > > + u32 result;
> > > +
> > > + cb = qdisc_skb_cb(skb);
> > > + pseudo_cb = (u16 *)bpf_skb_cb((struct sk_buff *)skb);
> > > +
> > > + /* Save Control Block */
> > > + memcpy(&cb_saved, cb, sizeof(cb_saved));
> > > + memset(cb, 0, sizeof(cb_saved));
> > > +
> > > + /* Pass parameters to the BPF program */
> > > + cb->flow_keys = &flow_keys;
> > > + *pseudo_cb = nhoff;
> >
> > I don't understand this bit.
> > What is this pseudo_cb and why nhoff goes in there?
> > Some odd way to pass it into the prog?
>
> Yes. nhoff passes the offset to the program to start parsing from.
> Applications also pass this during tail calls.
>
> Alternatively we can just add a new field to struct bpf_flow_keys.
I think that certainly will be cleaner and easier to use from
bpf prog pov. Since flow_keys stay constant any change to nhoff
between tail_calls will be preserved too. I see no cons to such approach.
Powered by blists - more mailing lists