[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <84f179cf-6c88-0190-153a-d7eaf5bac52c@fb.com>
Date: Wed, 6 May 2020 11:08:24 -0700
From: Alexei Starovoitov <ast@...com>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
CC: Yonghong Song <yhs@...com>, Andrii Nakryiko <andriin@...com>,
bpf <bpf@...r.kernel.org>, Martin KaFai Lau <kafai@...com>,
Networking <netdev@...r.kernel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Kernel Team <kernel-team@...com>
Subject: Re: [PATCH bpf-next v2 03/20] bpf: support bpf tracing/iter programs
for BPF_LINK_CREATE
On 5/5/20 8:09 PM, Andrii Nakryiko wrote:
> On Tue, May 5, 2020 at 5:54 PM Alexei Starovoitov <ast@...com> wrote:
>>
>> On 5/5/20 5:14 PM, Yonghong Song wrote:
>>>
>>>
>>> On 5/5/20 2:30 PM, Andrii Nakryiko wrote:
>>>> On Sun, May 3, 2020 at 11:26 PM Yonghong Song <yhs@...com> wrote:
>>>>>
>>>>> Given a bpf program, the step to create an anonymous bpf iterator is:
>>>>> - create a bpf_iter_link, which combines bpf program and the target.
>>>>> In the future, there could be more information recorded in the
>>>>> link.
>>>>> A link_fd will be returned to the user space.
>>>>> - create an anonymous bpf iterator with the given link_fd.
>>>>>
>>>>> The bpf_iter_link can be pinned to bpffs mount file system to
>>>>> create a file based bpf iterator as well.
>>>>>
>>>>> The benefit to use of bpf_iter_link:
>>>>> - using bpf link simplifies design and implementation as bpf link
>>>>> is used for other tracing bpf programs.
>>>>> - for file based bpf iterator, bpf_iter_link provides a standard
>>>>> way to replace underlying bpf programs.
>>>>> - for both anonymous and free based iterators, bpf link query
>>>>> capability can be leveraged.
>>>>>
>>>>> The patch added support of tracing/iter programs for BPF_LINK_CREATE.
>>>>> A new link type BPF_LINK_TYPE_ITER is added to facilitate link
>>>>> querying. Currently, only prog_id is needed, so there is no
>>>>> additional in-kernel show_fdinfo() and fill_link_info() hook
>>>>> is needed for BPF_LINK_TYPE_ITER link.
>>>>>
>>>>> Signed-off-by: Yonghong Song <yhs@...com>
>>>>> ---
>>>>
>>>> LGTM. See small nit about __GFP_NOWARN.
>>>>
>>>> Acked-by: Andrii Nakryiko <andriin@...com>
>>>>
>>>>
>>>>> include/linux/bpf.h | 1 +
>>>>> include/linux/bpf_types.h | 1 +
>>>>> include/uapi/linux/bpf.h | 1 +
>>>>> kernel/bpf/bpf_iter.c | 62 ++++++++++++++++++++++++++++++++++
>>>>> kernel/bpf/syscall.c | 14 ++++++++
>>>>> tools/include/uapi/linux/bpf.h | 1 +
>>>>> 6 files changed, 80 insertions(+)
>>>>>
>>>>
>>>> [...]
>>>>
>>>>> +int bpf_iter_link_attach(const union bpf_attr *attr, struct bpf_prog
>>>>> *prog)
>>>>> +{
>>>>> + struct bpf_link_primer link_primer;
>>>>> + struct bpf_iter_target_info *tinfo;
>>>>> + struct bpf_iter_link *link;
>>>>> + bool existed = false;
>>>>> + u32 prog_btf_id;
>>>>> + int err;
>>>>> +
>>>>> + if (attr->link_create.target_fd || attr->link_create.flags)
>>>>> + return -EINVAL;
>>>>> +
>>>>> + prog_btf_id = prog->aux->attach_btf_id;
>>>>> + mutex_lock(&targets_mutex);
>>>>> + list_for_each_entry(tinfo, &targets, list) {
>>>>> + if (tinfo->btf_id == prog_btf_id) {
>>>>> + existed = true;
>>>>> + break;
>>>>> + }
>>>>> + }
>>>>> + mutex_unlock(&targets_mutex);
>>>>> + if (!existed)
>>>>> + return -ENOENT;
>>>>> +
>>>>> + link = kzalloc(sizeof(*link), GFP_USER | __GFP_NOWARN);
>>>>
>>>> nit: all existing link implementation don't specify __GFP_NOWARN,
>>>> wonder if bpf_iter_link should be special?
>>>
>>> Nothing special. Just feel __GFP_NOWARN is the right thing to do to
>>> avoid pollute dmesg since -ENOMEM is returned to user space. But in
>>> reality, unlike some key/value allocation where the size could be huge
>>> and __GFP_NOWARN might be more useful, here, sizeof(*link) is fixed
>>> and small, __GFP_NOWARN probably not that useful.
>>>
>>> Will drop it.
>>
>> actually all existing user space driven allocation have nowarn.
>
> Can you define "user space driven"? I understand why for map, map key,
> map value, program we want to do that, because it's way too easy for
> user-space to specify huge sizes and allocation is proportional to
> that size. But in this case links are fixed-sized objects, same as
> struct file and struct inode. From BPF world, for instance, there is
> struct bpf_prog_list, which is created when user is attaching BPF
> program to cgroup, so it is user-space driven in similar sense. Yet we
> allocate it without __GFP_NOWARN.
For tiny objects it doesn't really matter. If slab cannot allocate
another single page the system is in bad shape and warn is good
to have in most cases, but when it's user driven like here that
warn won't help kernel developers debug ooms. Most likely NIC driver
is spamming page alloc warn at this point.
In this particular case bpf_iter arguments will likely grow
and struct will grow too, but probably not the point of kmalloc_large,
so it's really fine which ever way.
Personally I would keep nowarn here.
Powered by blists - more mailing lists