[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2a01bde2-ffcd-02e0-0961-5b85f8b7c113@fb.com>
Date: Tue, 5 May 2020 17:07:18 -0700
From: Yonghong Song <yhs@...com>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
CC: Andrii Nakryiko <andriin@...com>, bpf <bpf@...r.kernel.org>,
Martin KaFai Lau <kafai@...com>,
Networking <netdev@...r.kernel.org>,
Alexei Starovoitov <ast@...com>,
Daniel Borkmann <daniel@...earbox.net>,
Kernel Team <kernel-team@...com>
Subject: Re: [PATCH bpf-next v2 02/20] bpf: allow loading of a bpf_iter
program
On 5/5/20 2:29 PM, Andrii Nakryiko wrote:
> On Sun, May 3, 2020 at 11:26 PM Yonghong Song <yhs@...com> wrote:
>>
>> A bpf_iter program is a tracing program with attach type
>> BPF_TRACE_ITER. The load attribute
>> attach_btf_id
>> is used by the verifier against a particular kernel function,
>> which represents a target, e.g., __bpf_iter__bpf_map
>> for target bpf_map which is implemented later.
>>
>> The program return value must be 0 or 1 for now.
>> 0 : successful, except potential seq_file buffer overflow
>> which is handled by seq_file reader.
>> 1 : request to restart the same object
>
> This bit is interesting. Is the idea that if BPF program also wants to
> send something over, say, perf_buffer, but fails, it can "request"
> same execution again? I wonder if typical libc fread() implementation
Yes. The bpf_seq_read() can handle this the same as any other
retry request. The following is current mapping.
bpf program return 0 ---> seq_ops->show() return 0
bpf program return 1 ---> seq_ops->show() return -EAGAIN
> would handle EAGAIN properly, it seems more driven towards
> non-blocking I/O?
I did not have a test for this in current patch set for bpf program
returning 1. Will add a test in the next version.
>
> On the other hand, following start/show/next logic for seq_file
> iteration, requesting skipping element seems useful. It would allow
> (in some cases) to "speculatively" generate output and at some point
> realize that this is not an element we actually want in the output and
> request to ignore that output.
>
> Don't know how useful the latter is going to be in practice, but just
> something to keep in mind for the future, I guess...
>
>>
>> In the future, other return values may be used for filtering or
>> teminating the iterator.
>>
>> Signed-off-by: Yonghong Song <yhs@...com>
>> ---
>> include/linux/bpf.h | 3 +++
>> include/uapi/linux/bpf.h | 1 +
>> kernel/bpf/bpf_iter.c | 30 ++++++++++++++++++++++++++++++
>> kernel/bpf/verifier.c | 21 +++++++++++++++++++++
>> tools/include/uapi/linux/bpf.h | 1 +
>> 5 files changed, 56 insertions(+)
>>
>
> [...]
>
>
>> +
>> +bool bpf_iter_prog_supported(struct bpf_prog *prog)
>> +{
>> + const char *attach_fname = prog->aux->attach_func_name;
>> + u32 prog_btf_id = prog->aux->attach_btf_id;
>> + const char *prefix = BPF_ITER_FUNC_PREFIX;
>> + struct bpf_iter_target_info *tinfo;
>> + int prefix_len = strlen(prefix);
>> + bool supported = false;
>> +
>> + if (strncmp(attach_fname, prefix, prefix_len))
>> + return false;
>> +
>> + mutex_lock(&targets_mutex);
>> + list_for_each_entry(tinfo, &targets, list) {
>> + if (tinfo->btf_id && tinfo->btf_id == prog_btf_id) {
>> + supported = true;
>> + break;
>> + }
>> + if (!strcmp(attach_fname + prefix_len, tinfo->target)) {
>> + tinfo->btf_id = prog->aux->attach_btf_id;
>
> This target_info->btf_id caching here is a bit subtle and easy to
> miss, it would be nice to have a code calling this out explicitly.
Will do.
> Thanks!
>
>> + supported = true;
>> + break;
>> + }
>> + }
>> + mutex_unlock(&targets_mutex);
>> +
>> + return supported;
>> +}
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 70ad009577f8..d725ff7d11db 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -7101,6 +7101,10 @@ static int check_return_code(struct bpf_verifier_env *env)
>> return 0;
>> range = tnum_const(0);
>> break;
>> + case BPF_PROG_TYPE_TRACING:
>> + if (env->prog->expected_attach_type != BPF_TRACE_ITER)
>> + return 0;
>
> Commit message mentions enforcing [0, 1], shouldn't it be done here?
The default range is [0, 1], hence no explicit assignment here.
static int check_return_code(struct bpf_verifier_env *env)
{
struct tnum enforce_attach_type_range = tnum_unknown;
const struct bpf_prog *prog = env->prog;
struct bpf_reg_state *reg;
struct tnum range = tnum_range(0, 1);
......
>
>
>> + break;
>> default:
>> return 0;
>> }
>
> [...]
>
Powered by blists - more mailing lists