[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20180619212455.i5dmilth76jouvqu@kafai-mbp.dhcp.thefacebook.com>
Date: Tue, 19 Jun 2018 14:24:55 -0700
From: Martin KaFai Lau <kafai@...com>
To: David Ahern <dsahern@...il.com>
CC: <dsahern@...nel.org>, <netdev@...r.kernel.org>,
<borkmann@...earbox.net>, <ast@...nel.org>, <davem@...emloft.net>
Subject: Re: [PATCH bpf-net] bpf: Change bpf_fib_lookup to return lookup
status
On Tue, Jun 19, 2018 at 02:16:53PM -0600, David Ahern wrote:
> On 6/19/18 10:36 AM, Martin KaFai Lau wrote:
> > On Tue, Jun 19, 2018 at 09:34:28AM -0600, David Ahern wrote:
> >> On 6/19/18 9:25 AM, Martin KaFai Lau wrote:
> >>> On Mon, Jun 18, 2018 at 03:35:25PM -0600, David Ahern wrote:
> >>>> On 6/18/18 2:55 PM, Martin KaFai Lau wrote:
> >>>>>> /* rc > 0 case */
> >>>>>> switch(rc) {
> >>>>>> case BPF_FIB_LKUP_RET_BLACKHOLE:
> >>>>>> case BPF_FIB_LKUP_RET_UNREACHABLE:
> >>>>>> case BPF_FIB_LKUP_RET_PROHIBIT:
> >>>>>> return XDP_DROP;
> >>>>>> }
> >>>>>>
> >>>>>> For the others it becomes a question of do we share why the stack needs
> >>>>>> to be involved? Maybe the program wants to collect stats to show traffic
> >>>>>> patterns that can be improved (BPF_FIB_LKUP_RET_FRAG_NEEDED) or support
> >>>>>> in the kernel needs to be improved (BPF_FIB_LKUP_RET_UNSUPP_LWT) or an
> >>>>>> interface is misconfigured (BPF_FIB_LKUP_RET_FWD_DISABLED).
> >>>>> Thanks for the explanation.
> >>>>>
> >>>>> Agree on the bpf able to collect stats will be useful.
> >>>>>
> >>>>> I am wondering, if a new BPF_FIB_LKUP_RET_XYZ is added later,
> >>>>> how may the old xdp_prog work/not-work? As of now, the return value
> >>>>> is straight forward, FWD, PASS (to stack) or DROP (error).
> >>>>> With this change, the xdp_prog needs to match/switch() the
> >>>>> BPF_FIB_LKUP_RET_* to at least PASS and DROP.
> >>>>
> >>>> IMO, programs should only call XDP_DROP for known reasons - like the 3
> >>>> above. Anything else punt to the stack.
> >>>>
> >>>> If a new RET_XYZ comes along:
> >>>> 1. the new XYZ is a new ACL response where the packet is to be dropped.
> >>>> If the program does not understand XYZ and punts to the stack
> >>>> (recommendation), then a second lookup is done during normal packet
> >>>> processing and the stack drops it.
> >>>>
> >>>> 2. the new XYZ is a new path in the kernel that is unsupported with
> >>>> respect to XDP forwarding, nothing new for the program to do.
> >>>>
> >>>> Either way I would expect stats on BPF_FIB_LKUP_RET_* to give a hint to
> >>>> the program writer.
> >>>>
> >>>> Worst case of punting packets to the stack for any rc != 0 means the
> >>>> stack is doing 2 lookups - 1 in XDP based on its lookup parameters and 1
> >>>> in normal stack processing - to handle the packet.
> >>> Instead of having the xdp_prog to follow the meaning of what RET_SYZ is,
> >>> should the bpf_*_fib_lookup() return value be kept as is such that
> >>> the xdp_prog is clear what to do. The reason can be returned in
> >>> the 'struct bpf_fib_lookup'. The number of reasons can be extended.
> >>> If the xdp_prog does not understand a reason, it still will not
> >>> affect its decision because the return value is clear.
> >>> I think the situation here is similar to regular syscall which usually
> >>> uses -1 to clearly states error and errno to spells out the reason.
> >>>
> >>
> >> I did consider returning the status in struct bpf_fib_lookup. However,
> >> it is 64 bytes and can not be extended without a big performance
> >> penalty, so the only option there is to make an existing entry a union
> >> the most logical of which is the ifindex. It seemed odd to me to have
> >> the result by hidden in the struct as a union on ifindex and returning
> >> the egress index from the function:
> >>
> >> @@ -2625,7 +2636,11 @@ struct bpf_fib_lookup {
> >>
> >> /* total length of packet from network header - used for MTU
> >> check */
> >> __u16 tot_len;
> >> - __u32 ifindex; /* L3 device index for lookup */
> >> +
> >> + union {
> >> + __u32 ifindex; /* input: L3 device index for lookup */
> >> + __u32 result; /* output: one of BPF_FIB_LKUP_RET_* */
> >> + };
> >>
> >>
> >> It seemed more natural to have ifindex stay ifindex and only change
> >> value on return:
> >>
> >> @@ -2625,7 +2639,11 @@ struct bpf_fib_lookup {
> >>
> >> /* total length of packet from network header - used for MTU check */
> >> __u16 tot_len;
> >> - __u32 ifindex; /* L3 device index for lookup */
> >> +
> >> + /* input: L3 device index for lookup
> >> + * output: nexthop device index from FIB lookup
> >> + */
> >> + __u32 ifindex;
> >>
> >> union {
> >> /* inputs to lookup */
> >>
> >>
> >> From a program's perspective:
> >>
> >> rc < 0 -- program is passing incorrect data
> >> rc == 0 -- packet can be forwarded
> >> rc > 0 -- packet can not be forwarded.
> >>
> >> BPF programs are not required to track the LKUP_RET values any more than
> >> a function returning multiple negative values - the caller just checks
> >> rc < 0 means failure. If the program cares it can look at specific
> >> values of rc to see the specific value.
> >>
> >> The same applies with the LKUP_RET values - they are there to provide
> >> insight into why the packet is not forwarded directly if the program
> >> cares to know why.
> > hmm...ic. My concern is, the prog can interpret rc > 0 (in this patch) to be
> > drop vs pass (although we can advise them in bpf.h to always pass if it does
> > not understand a rc but it is not a strong contract), it may catch people
> > a surprise if a xdp_prog suddenly drops everything when running in a
> > newer kernel where the upper stack can actually handle it.
> >
> > while the current behavior (i.e. before this patch, rc == 0) is always pass
> > to the stack.
> >
> > I think at least comments should be put in the enum such that
> > the xdp/tc_prog should expect the enum could be extended later, so
> > the suggested behavior should be a pass for unknown LKUP_RET and let
> > the stack to decide.
> >
>
> All APIs with enums have the inherent quality that more can be added.
> Nothing about rc > 0 says it is ok to drop the packet and nothing in the
> documentation says it is ok to drop the packet. The program author needs
> to look at the extra information provided by the rc. Specific values are
> a hint that yes the packet can be dropped; others merely say 'packet
> needs help from the stack' with a reason why it needs help.
Right, I understand there are values that hint drop (as your earlier example)
and there are values that hint pass to stack. and either set of these values
could be extended later.
>
> My intention is to allow the XDP program to understand FIB based ACLs.
> To that end a fib lookup result of blackhole, unreachable and prohibit
> needs to be returned to the xdp program with the effective summary of
> "can drop in the xdp program".
>
> If the other return codes are going to cause confusion then less shorten
> the list:
>
> enum {
> BPF_FIB_LKUP_RET_SUCCESS, /* packet is to be forwareded */
> BPF_FIB_LKUP_RET_CAN_DROP, /* XDP program can drop the packet */
> BPF_FIB_LKUP_RET_NEED_STACK, /* packet needs full stack assist */
> };
>
> But, that still does not solve the problem of rc > 0 means xdp program
> can drop the packet, but then that is not the intention of rc > 0. The
> intention is only "here's more information about why this packet can not
> be forwarded at this layer"
ok. Please spell out this intention in bpf.h so that the writer will not
default to DROP for the reason that it does not understand.
Powered by blists - more mailing lists