[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <e7b992e3-8059-4058-8561-cb017c200c8d@app.fastmail.com>
Date: Thu, 05 Oct 2023 22:16:53 +0200
From: Martynas <m@...bda.lt>
To: "Martin KaFai Lau" <martin.lau@...ux.dev>
Cc: "Daniel Borkmann" <daniel@...earbox.net>, netdev <netdev@...r.kernel.org>,
"Nikolay Aleksandrov" <razor@...ckwall.org>, bpf@...r.kernel.org
Subject: Re: [PATCH bpf v2 1/2] bpf: Derive source IP addr via bpf_*_fib_lookup()
Thanks for the review.
On Wed, Oct 4, 2023, at 10:09 PM, Martin KaFai Lau wrote:
> On 10/3/23 12:10 AM, Martynas Pumputis wrote:
>> Extend the bpf_fib_lookup() helper by making it to return the source
>> IPv4/IPv6 address if the BPF_FIB_LOOKUP_SET_SRC flag is set.
>>
>> For example, the following snippet can be used to derive the desired
>> source IP address:
>>
>> struct bpf_fib_lookup p = { .ipv4_dst = ip4->daddr };
>>
>> ret = bpf_skb_fib_lookup(skb, p, sizeof(p),
>> BPF_FIB_LOOKUP_SET_SRC | BPF_FIB_LOOKUP_SKIP_NEIGH);
>> if (ret != BPF_FIB_LKUP_RET_SUCCESS)
>> return TC_ACT_SHOT;
>>
>> /* the p.ipv4_src now contains the source address */
>>
>> The inability to derive the proper source address may cause malfunctions
>> in BPF-based dataplanes for hosts containing netdevs with more than one
>> routable IP address or for multi-homed hosts.
>>
>> For example, Cilium implements packet masquerading in BPF. If an
>> egressing netdev to which the Cilium's BPF prog is attached has
>> multiple IP addresses, then only one [hardcoded] IP address can be used for
>> masquerading. This breaks connectivity if any other IP address should have
>> been selected instead, for example, when a public and private addresses
>> are attached to the same egress interface.
>>
>> The change was tested with Cilium [1].
>>
>> Nikolay Aleksandrov helped to figure out the IPv6 addr selection.
>>
>> [1]: https://github.com/cilium/cilium/pull/28283
>>
>> Signed-off-by: Martynas Pumputis <m@...bda.lt>
>> ---
>> include/net/ipv6_stubs.h | 5 +++++
>> include/uapi/linux/bpf.h | 9 +++++++++
>> net/core/filter.c | 19 ++++++++++++++++++-
>> net/ipv6/af_inet6.c | 1 +
>> tools/include/uapi/linux/bpf.h | 10 ++++++++++
>> 5 files changed, 43 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/net/ipv6_stubs.h b/include/net/ipv6_stubs.h
>> index c48186bf4737..21da31e1dff5 100644
>> --- a/include/net/ipv6_stubs.h
>> +++ b/include/net/ipv6_stubs.h
>> @@ -85,6 +85,11 @@ struct ipv6_bpf_stub {
>> sockptr_t optval, unsigned int optlen);
>> int (*ipv6_getsockopt)(struct sock *sk, int level, int optname,
>> sockptr_t optval, sockptr_t optlen);
>> + int (*ipv6_dev_get_saddr)(struct net *net,
>> + const struct net_device *dst_dev,
>> + const struct in6_addr *daddr,
>> + unsigned int prefs,
>> + struct in6_addr *saddr);
>> };
>> extern const struct ipv6_bpf_stub *ipv6_bpf_stub __read_mostly;
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 0448700890f7..a6bf686eecbc 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -3257,6 +3257,10 @@ union bpf_attr {
>> * and *params*->smac will not be set as output. A common
>> * use case is to call **bpf_redirect_neigh**\ () after
>> * doing **bpf_fib_lookup**\ ().
>> + * **BPF_FIB_LOOKUP_SET_SRC**
>> + * Derive and set source IP addr in *params*->ipv{4,6}_src
>> + * for the nexthop. If the src addr cannot be derived,
>> + * **BPF_FIB_LKUP_RET_NO_SRC_ADDR** is returned.
>> *
>> * *ctx* is either **struct xdp_md** for XDP programs or
>> * **struct sk_buff** tc cls_act programs.
>> @@ -6953,6 +6957,7 @@ enum {
>> BPF_FIB_LOOKUP_OUTPUT = (1U << 1),
>> BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2),
>> BPF_FIB_LOOKUP_TBID = (1U << 3),
>> + BPF_FIB_LOOKUP_SET_SRC = (1U << 4),
>
> bikeshedding: Shorten it to BPF_FIB_LOOKUP_SRC?
SGTM.
>
>> };
>>
>> enum {
>> @@ -6965,6 +6970,7 @@ enum {
>> BPF_FIB_LKUP_RET_UNSUPP_LWT, /* fwd requires encapsulation */
>> BPF_FIB_LKUP_RET_NO_NEIGH, /* no neighbor entry for nh */
>> BPF_FIB_LKUP_RET_FRAG_NEEDED, /* fragmentation required to fwd */
>> + BPF_FIB_LKUP_RET_NO_SRC_ADDR, /* failed to derive IP src addr */
>> };
>>
>> struct bpf_fib_lookup {
>> @@ -6999,6 +7005,9 @@ struct bpf_fib_lookup {
>> __u32 rt_metric;
>> };
>>
>> + /* input: source address to consider for lookup
>> + * output: source address result from lookup
>> + */
>> union {
>> __be32 ipv4_src;
>> __u32 ipv6_src[4]; /* in6_addr; network order */
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index a094694899c9..f3777ef1840b 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -5850,6 +5850,9 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>> params->rt_metric = res.fi->fib_priority;
>> params->ifindex = dev->ifindex;
>>
>> + if (flags & BPF_FIB_LOOKUP_SET_SRC)
>> + params->ipv4_src = fib_result_prefsrc(net, &res);
>
> Does it need to check 0 and return BPF_FIB_LKUP_RET_NO_SRC_ADDR for v4 also,
> or the fib_result_prefsrc does not fail?
>From inspecting the other usages of the function - it does not fail.
>
>> +
>> /* xdp and cls_bpf programs are run in RCU-bh so
>> * rcu_read_lock_bh is not needed here
>> */
>> @@ -5992,6 +5995,19 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>> params->rt_metric = res.f6i->fib6_metric;
>> params->ifindex = dev->ifindex;
>>
>> + if (flags & BPF_FIB_LOOKUP_SET_SRC) {
>> + if (res.f6i->fib6_prefsrc.plen) {
>> + *(struct in6_addr *)params->ipv6_src = res.f6i->fib6_prefsrc.addr;
>> + } else {
>> + err = ipv6_bpf_stub->ipv6_dev_get_saddr(net, dev,
>> + &fl6.daddr, 0,
>> + (struct in6_addr *)
>> + params->ipv6_src);
>> + if (err)
>> + return BPF_FIB_LKUP_RET_NO_SRC_ADDR;
>
> This error also implies BPF_FIB_LKUP_RET_NO_NEIGH. I don't have a clean way of
> improving the API. May be others have some ideas.
>
> Considering dev has no saddr is probably (?) an unlikely case, it should be ok
> to leave it as is but at least a comment in the uapi will be needed. Otherwise,
> the bpf prog may use the 0 dmac as-is.
I expect that a user of the helper checks that err == 0 before using any of the output params.
>
> I feel the current bpf_ipv[46]_fib_lookup helper is doing many things
> in one
> function and then requires different BPF_FIB_LOOKUP_* bits to select
> what/how to
> do. In the future, it may be worth to consider breaking it into smaller
> kfunc(s). e.g. the __ipv[46]_neigh_lookup could be in its own kfunc.
>
Yep, good idea. At least it seems that the neigh lookup could live in its own function.
>> + }
>> + }
>> +
>> if (flags & BPF_FIB_LOOKUP_SKIP_NEIGH)
>> goto set_fwd_params;
>>
>> @@ -6010,7 +6026,8 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>> #endif
>>
>> #define BPF_FIB_LOOKUP_MASK (BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT | \
>> - BPF_FIB_LOOKUP_SKIP_NEIGH | BPF_FIB_LOOKUP_TBID)
>> + BPF_FIB_LOOKUP_SKIP_NEIGH | BPF_FIB_LOOKUP_TBID | \
>> + BPF_FIB_LOOKUP_SET_SRC)
>>
>> BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx,
>> struct bpf_fib_lookup *, params, int, plen, u32, flags)
Powered by blists - more mailing lists