[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210730080505.48959-1-kuniyu@amazon.co.jp>
Date: Fri, 30 Jul 2021 17:05:05 +0900
From: Kuniyuki Iwashima <kuniyu@...zon.co.jp>
To: <yhs@...com>
CC: <andrii@...nel.org>, <ast@...nel.org>, <benh@...zon.com>,
<bpf@...r.kernel.org>, <daniel@...earbox.net>,
<davem@...emloft.net>, <john.fastabend@...il.com>, <kafai@...com>,
<kpsingh@...nel.org>, <kuba@...nel.org>, <kuni1840@...il.com>,
<kuniyu@...zon.co.jp>, <netdev@...r.kernel.org>,
<songliubraving@...com>
Subject: Re: [PATCH bpf-next 1/2] bpf: af_unix: Implement BPF iterator for UNIX domain socket.
From: Yonghong Song <yhs@...com>
Date: Fri, 30 Jul 2021 00:09:08 -0700
> On 7/29/21 11:53 PM, Kuniyuki Iwashima wrote:
> > From: Yonghong Song <yhs@...com>
> > Date: Thu, 29 Jul 2021 23:24:41 -0700
> >> On 7/29/21 4:36 PM, Kuniyuki Iwashima wrote:
> >>> This patch implements the BPF iterator for the UNIX domain socket.
> >>>
> >>> Currently, the batch optimization introduced for the TCP iterator in the
> >>> commit 04c7820b776f ("bpf: tcp: Bpf iter batching and lock_sock") is not
> >>> applied. It will require replacing the big lock for the hash table with
> >>> small locks for each hash list not to block other processes.
> >>
> >> Thanks for the contribution. The patch looks okay except
> >> missing seq_ops->stop implementation, see below for more explanation.
> >>
> >>>
> >>> Signed-off-by: Kuniyuki Iwashima <kuniyu@...zon.co.jp>
> >>> ---
> >>> include/linux/btf_ids.h | 3 +-
> >>> net/unix/af_unix.c | 78 +++++++++++++++++++++++++++++++++++++++++
> >>> 2 files changed, 80 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/include/linux/btf_ids.h b/include/linux/btf_ids.h
> >>> index 57890b357f85..bed4b9964581 100644
> >>> --- a/include/linux/btf_ids.h
> >>> +++ b/include/linux/btf_ids.h
> >>> @@ -172,7 +172,8 @@ extern struct btf_id_set name;
> >>> BTF_SOCK_TYPE(BTF_SOCK_TYPE_TCP_TW, tcp_timewait_sock) \
> >>> BTF_SOCK_TYPE(BTF_SOCK_TYPE_TCP6, tcp6_sock) \
> >>> BTF_SOCK_TYPE(BTF_SOCK_TYPE_UDP, udp_sock) \
> >>> - BTF_SOCK_TYPE(BTF_SOCK_TYPE_UDP6, udp6_sock)
> >>> + BTF_SOCK_TYPE(BTF_SOCK_TYPE_UDP6, udp6_sock) \
> >>> + BTF_SOCK_TYPE(BTF_SOCK_TYPE_UNIX, unix_sock)
> >>>
> >>> enum {
> >>> #define BTF_SOCK_TYPE(name, str) name,
> >>> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> >>> index 89927678c0dc..d45ad87e3a49 100644
> >>> --- a/net/unix/af_unix.c
> >>> +++ b/net/unix/af_unix.c
> >>> @@ -113,6 +113,7 @@
> >>> #include <linux/security.h>
> >>> #include <linux/freezer.h>
> >>> #include <linux/file.h>
> >>> +#include <linux/btf_ids.h>
> >>>
> >>> #include "scm.h"
> >>>
> >>> @@ -2935,6 +2936,49 @@ static const struct seq_operations unix_seq_ops = {
> >>> .stop = unix_seq_stop,
> >>> .show = unix_seq_show,
> >>> };
> >>> +
> >>> +#ifdef CONFIG_BPF_SYSCALL
> >>> +struct bpf_iter__unix {
> >>> + __bpf_md_ptr(struct bpf_iter_meta *, meta);
> >>> + __bpf_md_ptr(struct unix_sock *, unix_sk);
> >>> + uid_t uid __aligned(8);
> >>> +};
> >>> +
> >>> +static int unix_prog_seq_show(struct bpf_prog *prog, struct bpf_iter_meta *meta,
> >>> + struct unix_sock *unix_sk, uid_t uid)
> >>> +{
> >>> + struct bpf_iter__unix ctx;
> >>> +
> >>> + meta->seq_num--; /* skip SEQ_START_TOKEN */
> >>> + ctx.meta = meta;
> >>> + ctx.unix_sk = unix_sk;
> >>> + ctx.uid = uid;
> >>> + return bpf_iter_run_prog(prog, &ctx);
> >>> +}
> >>> +
> >>> +static int bpf_iter_unix_seq_show(struct seq_file *seq, void *v)
> >>> +{
> >>> + struct bpf_iter_meta meta;
> >>> + struct bpf_prog *prog;
> >>> + struct sock *sk = v;
> >>> + uid_t uid;
> >>> +
> >>> + if (v == SEQ_START_TOKEN)
> >>> + return 0;
> >>> +
> >>> + uid = from_kuid_munged(seq_user_ns(seq), sock_i_uid(sk));
> >>> + meta.seq = seq;
> >>> + prog = bpf_iter_get_info(&meta, false);
> >>> + return unix_prog_seq_show(prog, &meta, v, uid);
> >>> +}
> >>> +
> >>> +static const struct seq_operations bpf_iter_unix_seq_ops = {
> >>> + .start = unix_seq_start,
> >>> + .next = unix_seq_next,
> >>> + .stop = unix_seq_stop,
> >>
> >> Although it is not required for /proc/net/unix, we should still
> >> implement bpf_iter version of seq_ops->stop here. The main purpose
> >> of bpf_iter specific seq_ops->stop is to call bpf program one
> >> more time after ALL elements have been traversed. Such
> >> functionality is implemented in all other bpf_iter variants.
> >
> > Thanks for your review!
> > I will implement the extra call in the next spin.
> >
> > Just out of curiosity, is there a specific use case for the last call?
>
> We don't have use cases for dumps similar to /proc/net/... etc.
> The original thinking is to permit in-kernel aggregation and the
> seq_ops->stop() bpf program will have an indication as the last
> bpf program invocation for the iterator at which point bpf program
> may wrap up aggregation and send/signal the result to user space.
> I am not sure whether people already used this feature or not, or
> people may have different way to do that (e.g., from user space
> directly checking map value if read() length is 0). But
> bpf seq_ops->stop() provides an in-kernel way for bpf program
> to respond to the end of iterating.
Aggregation, that makes sense.
Thank you!
Powered by blists - more mailing lists