[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8c00bd63-2d00-401e-af6d-1b6aebac4701@linux.dev>
Date: Mon, 29 Jan 2024 22:39:43 -0800
From: Martin KaFai Lau <martin.lau@...ux.dev>
To: Amery Hung <ameryhung@...il.com>
Cc: bpf@...r.kernel.org, yangpeihao@...u.edu.cn, toke@...hat.com,
jhs@...atatu.com, jiri@...nulli.us, sdf@...gle.com,
xiyou.wangcong@...il.com, yepeilin.cs@...il.com, netdev@...r.kernel.org,
Kui-Feng Lee <thinker.li@...il.com>
Subject: Re: [RFC PATCH v7 1/8] net_sched: Introduce eBPF based Qdisc
On 1/26/24 5:17 PM, Amery Hung wrote:
> On Thu, Jan 25, 2024 at 6:22 PM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
>>
>> On 1/23/24 9:22 PM, Amery Hung wrote:
>>>> I looked at the high level of the patchset. The major ops that it wants to be
>>>> programmable in bpf is the ".enqueue" and ".dequeue" (+ ".init" and ".reset" in
>>>> patch 4 and patch 5).
>>>>
>>>> This patch adds a new prog type BPF_PROG_TYPE_QDISC, four attach types (each for
>>>> ".enqueue", ".dequeue", ".init", and ".reset"), and a new "bpf_qdisc_ctx" in the
>>>> uapi. It is no long an acceptable way to add new bpf extension.
>>>>
>>>> Can the ".enqueue", ".dequeue", ".init", and ".reset" be completely implemented
>>>> in bpf (with the help of new kfuncs if needed)? Then a struct_ops for Qdisc_ops
>>>> can be created. The bpf Qdisc_ops can be loaded through the existing struct_ops api.
>>>>
>>> Partially. If using struct_ops, I think we'll need another structure
>>> like the following in bpf qdisc to be implemented with struct_ops bpf:
>>>
>>> struct bpf_qdisc_ops {
>>> int (*enqueue) (struct sk_buff *skb)
>>> void (*dequeue) (void)
>>> void (*init) (void)
>>> void (*reset) (void)
>>> };
>>>
>>> Then, Qdisc_ops will wrap around them to handle things that cannot be
>>> implemented with bpf (e.g., sch_tree_lock, returning a skb ptr).
>>
>> We can see how those limitations (calling sch_tree_lock() and returning a ptr)
>> can be addressed in bpf. This will also help other similar use cases.
>>
>
> For kptr, I wonder if we can support the following semantics in bpf if
> they make sense:
I think they are useful but they are not fully supported now.
Some thoughts below.
> 1. Passing a referenced kptr into a bpf program, which will also need
> to be released, or exchanged into maps or allocated objects.
"enqueue" should be the one considering here:
struct Qdisc_ops {
/* ... */
int (*enqueue)(struct sk_buff *skb,
struct Qdisc *sch,
struct sk_buff **to_free);
};
The verifier only marks the skb as a trusted kptr but does not mark its
reg->ref_obj_id. Take a look at btf_ctx_access(). In particular:
if (prog_args_trusted(prog))
info->reg_type |= PTR_TRUSTED;
The verifier does not know the skb ownership is passed into the ".enqueue" ops
and does not know the bpf prog needs to release it or store it in a map.
The verifier tracks the reference state when a KF_ACQUIRE kfunc is called (just
an example, not saying we need to use KF_ACQUIRE kfunc). Take a look at
acquire_reference_state() which is the useful one here.
Whenever the verifier is loading the ".enqueue" bpf_prog, the verifier can
always acquire_reference_state() for the "struct sk_buff *skb" argument.
Take a look at a recent RFC:
https://lore.kernel.org/bpf/20240122212217.1391878-1-thinker.li@gmail.com/
which is tagging the argument of an ops (e.g. ".enqueue" here). That RFC patch
is tagging the argument could be NULL by appending "__nullable" to the argument
name. The verifier will enforce that the bpf prog must check for NULL first.
The similar idea can be used here but with a different tagging (for example,
"__must_release", admittedly not a good name). While the RFC patch is
in-progress, for now, may be hardcode for the ".enqueue" ops in
check_struct_ops_btf_id() and always acquire_reference_state() for the skb. This
part can be adjusted later once the RFC patch will be in shape.
Then one more thing is to track when the struct_ops bpf prog is actually reading
the value of the skb pointer. One thing is worth to mention here, e.g. a
struct_ops prog for enqueue:
SEC("struct_ops")
int BPF_PROG(bpf_dropall_enqueue, struct sk_buff *skb, struct Qdisc *sch,
struct sk_buff **to_free)
{
return bpf_qdisc_drop(skb, sch, to_free);
}
Take a look at the BPF_PROG macro, the bpf prog is getting a pointer to an array
of __u64 as the only argument. The skb is actually in ctx[0], sch is in
ctx[1]...etc. When ctx[0] is read to get the skb pointer (e.g. r1 = ctx[0]),
btf_ctx_access() marks the reg_type to PTR_TRUSTED. It needs to also initialize
the reg->ref_obj_id by the id obtained earlier from acquire_reference_state()
during check_struct_ops_btf_id() somehow.
> 2. Returning a kptr from a program and treating it as releasing the reference.
e.g. for dequeue:
struct Qdisc_ops {
/* ... */
struct sk_buff * (*dequeue)(struct Qdisc *);
};
Right now the verifier should complain on check_reference_leak() if the
struct_ops bpf prog is returning a referenced kptr.
Unlike an argument, the return type of a function does not have a name to tag.
It is the first case that a struct_ops bpf_prog returning a pointer. One idea is
to assume it must be a trusted pointer (PTR_TRUSTED) and the verifier should
check it is indeed with PTR_TRUSTED flag.
May be release_reference_state() can be called to assume the kernel will release
it as long as the return pointer type is PTR_TRUSTED and the type matches the
return type of the ops. Take a look at check_return_code().
Powered by blists - more mailing lists