[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201119010045.a6mqkzuv4tjruny6@kafai-mbp.dhcp.thefacebook.com>
Date: Wed, 18 Nov 2020 17:00:45 -0800
From: Martin KaFai Lau <kafai@...com>
To: Kuniyuki Iwashima <kuniyu@...zon.co.jp>
CC: "David S . Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Eric Dumazet <edumazet@...gle.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Benjamin Herrenschmidt <benh@...zon.com>,
Kuniyuki Iwashima <kuni1840@...il.com>, <bpf@...r.kernel.org>,
<netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH bpf-next 7/8] bpf: Call bpf_run_sk_reuseport() for
socket migration.
On Tue, Nov 17, 2020 at 06:40:22PM +0900, Kuniyuki Iwashima wrote:
> This patch makes it possible to select a new listener for socket migration
> by eBPF.
>
> The noteworthy point is that we select a listening socket in
> reuseport_detach_sock() and reuseport_select_sock(), but we do not have
> struct skb in the unhash path.
>
> Since we cannot pass skb to the eBPF program, we run only the
> BPF_PROG_TYPE_SK_REUSEPORT program by calling bpf_run_sk_reuseport() with
> skb NULL. So, some fields derived from skb are also NULL in the eBPF
> program.
More things need to be considered here when skb is NULL.
Some helpers are probably assuming skb is not NULL.
Also, the sk_lookup in filter.c is actually passing a NULL skb to avoid
doing the reuseport select.
>
> Moreover, we can cancel migration by returning SK_DROP. This feature is
> useful when listeners have different settings at the socket API level or
> when we want to free resources as soon as possible.
>
> Reviewed-by: Benjamin Herrenschmidt <benh@...zon.com>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@...zon.co.jp>
> ---
> net/core/filter.c | 26 +++++++++++++++++++++-----
> net/core/sock_reuseport.c | 23 ++++++++++++++++++++---
> net/ipv4/inet_hashtables.c | 2 +-
> 3 files changed, 42 insertions(+), 9 deletions(-)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 01e28f283962..ffc4591878b8 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -8914,6 +8914,22 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type,
> SOCK_ADDR_LOAD_NESTED_FIELD_SIZE_OFF(S, NS, F, NF, \
> BPF_FIELD_SIZEOF(NS, NF), 0)
>
> +#define SOCK_ADDR_LOAD_NESTED_FIELD_SIZE_OFF_OR_NULL(S, NS, F, NF, SIZE, OFF) \
> + do { \
> + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(S, F), si->dst_reg, \
> + si->src_reg, offsetof(S, F)); \
> + *insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 1); \
Although it may not matter much, always doing this check seems not very ideal
considering the fast path will always have skb and only the slow
path (accept-queue migrate) has skb is NULL. I think the req_sk usually
has the skb also except the timer one.
First thought is to create a temp skb but it has its own issues.
or it may actually belong to a new prog type. However, lets keep
exploring possible options (including NULL skb).
> + *insn++ = BPF_LDX_MEM( \
> + SIZE, si->dst_reg, si->dst_reg, \
> + bpf_target_off(NS, NF, sizeof_field(NS, NF), \
> + target_size) \
> + + OFF); \
> + } while (0)
Powered by blists - more mailing lists