[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4729b693-20d7-dd9e-c48b-be8386ce9bed@gmail.com>
Date: Sun, 29 Apr 2018 19:13:35 -0600
From: David Ahern <dsahern@...il.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: netdev@...r.kernel.org, borkmann@...earbox.net, ast@...nel.org,
davem@...emloft.net, shm@...ulusnetworks.com,
roopa@...ulusnetworks.com, brouer@...hat.com, toke@...e.dk,
john.fastabend@...il.com
Subject: Re: [RFC v2 bpf-next 8/9] bpf: Provide helper to do lookups in kernel
FIB table
On 4/29/18 5:36 PM, Alexei Starovoitov wrote:
>> + if (flags & BPF_FIB_LOOKUP_DIRECT) {
>> + u32 tbid = l3mdev_fib_table_rcu(dev) ? : RT_TABLE_MAIN;
>> + struct fib_table *tb;
>> +
>> + tb = fib_get_table(net, tbid);
>> + if (unlikely(!tb))
>> + return 0;
>> +
>> + err = fib_table_lookup(tb, &fl4, &res, FIB_LOOKUP_NOREF);
>> + } else {
>> + fl4.flowi4_mark = 0;
>> + fl4.flowi4_secid = 0;
>> + fl4.flowi4_tun_key.tun_id = 0;
>> + fl4.flowi4_uid = sock_net_uid(net, NULL);
>> +
>> + err = fib_lookup(net, &fl4, &res, FIB_LOOKUP_NOREF);
>> + }
>> +
>> + if (err || res.type != RTN_UNICAST)
>> + return 0;
>
> should this be an error returned to the user instead of zero?
> Seems useful to indicate.
res.type != UNICAST is not an error; it means other delivery type (e.g.,
local).
err < 0 means unreachable, prohibit, blackhole, etc. Arguably the error
could be returned to the xdp program, but it is more complicated than
that. Blackhole is a common default route or policy, but RTN_BLACKHOLE
== -EINVAL which is also the error code if the user passes invalid
arguments to the program.
>
>> +
>> + if (res.fi->fib_nhs > 1)
>> + fib_select_path(net, &res, &fl4, NULL);
>> +
>> + nh = &res.fi->fib_nh[res.nh_sel];
>> +
>> + /* do not handle lwt encaps right now */
>> + if (nh->nh_lwtstate)
>> + return 0;
>
> adn return enotsupp here?
see below
>
>> +
>> + dev = nh->nh_dev;
>> + if (unlikely(!dev))
>> + return 0;
>
> enodev ?
see below
>
>> +
>> + if (nh->nh_gw)
>> + params->ipv4_dst = nh->nh_gw;
>> +
>> + params->rt_metric = res.fi->fib_priority;
>> +
>> + /* xdp and cls_bpf programs are run in RCU-bh so
>> + * rcu_read_lock_bh is not needed here
>> + */
>> + neigh = __ipv4_neigh_lookup_noref(dev, (__force u32)params->ipv4_dst);
>> + if (neigh)
>> + return bpf_fib_set_fwd_params(params, neigh, dev);
>> +
>> + return 0;
>
> Even this return 0 doesn't quite fit to what doc says:
> "0 if packet needs to continue up the stack for further processing"
> What stack suppose to do ?
First packet on a route the nexthop may not be resolved. Without punting
to the stack it never has an impetus to resolve that neighbor.
> It will hit the same condition and packet will be dropped, right?
no. It can resolve the neighbor so follow up packets can be forwarded in
the fast path.
> Isn't it better to report all errors back to bpf prog and let
> the program make decision instead of 'return 0' almost everywhere?
The idea here is to fast pass packets that fit a supported profile and
are to be forwarded. Everything else should continue up the stack as it
has wider capabilities. The helper and XDP programs should make no
assumptions on what the broader kernel and userspace might be monitoring
or want to do with packets that can not be forwarded in the fast path.
This is very similar to hardware forwarding when it punts packets to the
CPU for control plane assistance.
Powered by blists - more mailing lists