[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMB2axOZqwgksukO5d4OiXeEgo2jFrgnzO5PQwABi_WxYFycGg@mail.gmail.com>
Date: Fri, 26 Jan 2024 17:17:26 -0800
From: Amery Hung <ameryhung@...il.com>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: bpf@...r.kernel.org, yangpeihao@...u.edu.cn, toke@...hat.com,
jhs@...atatu.com, jiri@...nulli.us, sdf@...gle.com, xiyou.wangcong@...il.com,
yepeilin.cs@...il.com, netdev@...r.kernel.org
Subject: Re: [RFC PATCH v7 1/8] net_sched: Introduce eBPF based Qdisc
On Thu, Jan 25, 2024 at 6:22 PM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
>
> On 1/23/24 9:22 PM, Amery Hung wrote:
> >> I looked at the high level of the patchset. The major ops that it wants to be
> >> programmable in bpf is the ".enqueue" and ".dequeue" (+ ".init" and ".reset" in
> >> patch 4 and patch 5).
> >>
> >> This patch adds a new prog type BPF_PROG_TYPE_QDISC, four attach types (each for
> >> ".enqueue", ".dequeue", ".init", and ".reset"), and a new "bpf_qdisc_ctx" in the
> >> uapi. It is no long an acceptable way to add new bpf extension.
> >>
> >> Can the ".enqueue", ".dequeue", ".init", and ".reset" be completely implemented
> >> in bpf (with the help of new kfuncs if needed)? Then a struct_ops for Qdisc_ops
> >> can be created. The bpf Qdisc_ops can be loaded through the existing struct_ops api.
> >>
> > Partially. If using struct_ops, I think we'll need another structure
> > like the following in bpf qdisc to be implemented with struct_ops bpf:
> >
> > struct bpf_qdisc_ops {
> > int (*enqueue) (struct sk_buff *skb)
> > void (*dequeue) (void)
> > void (*init) (void)
> > void (*reset) (void)
> > };
> >
> > Then, Qdisc_ops will wrap around them to handle things that cannot be
> > implemented with bpf (e.g., sch_tree_lock, returning a skb ptr).
>
> We can see how those limitations (calling sch_tree_lock() and returning a ptr)
> can be addressed in bpf. This will also help other similar use cases.
>
For kptr, I wonder if we can support the following semantics in bpf if
they make sense:
1. Passing a referenced kptr into a bpf program, which will also need
to be released, or exchanged into maps or allocated objects.
2. Returning a kptr from a program and treating it as releasing the reference.
> Other than sch_tree_lock and returning a ptr from a bpf prog. What else do you
> see that blocks directly implementing the enqueue/dequeue/init/reset in the
> struct Qdisc_ops?
>
Not much! We can deal with sch_tree_lock later since
enqueue/dequeue/init/reset are unlikely to use it.
> Have you thought above ".priv_size"? It is now fixed to sizeof(struct
> bpf_sched_data). It should be useful to allow the bpf prog to store its own data
> there?
>
Maybe we can let bpf qdiscs store statistics here and make it work
with netlink. I haven't explored much in how bpf qdiscs record and
share statistics with user space.
> >
> >> If other ops (like ".dump", ".dump_stats"...) do not have good use case to be
> >> programmable in bpf, it can stay with the kernel implementation for now and only
> >> allows the userspace to load the a bpf Qdisc_ops with .equeue/dequeue/init/reset
> >> implemented.
> >>
> >> You mentioned in the cover letter that:
> >> "Current struct_ops attachment model does not seem to support replacing only
> >> functions of a specific instance of a module, but I might be wrong."
> >>
> >> I assumed you meant allow bpf to replace only "some" ops of the Qdisc_ops? Yes,
> >> it can be done through the struct_ops's ".init_member". Take a look at
> >> bpf_tcp_ca_init_member. The kernel can assign the kernel implementation for
> >> ".dump" (for example) when loading the bpf Qdisc_ops.
> >>
> > I have no problem with partially replacing a struct, which like you
> > mentioned has been demonstrated by congestion control or sched_ext.
> > What I am not sure about is the ability to create multiple bpf qdiscs,
> > where each has different ".enqueue", ".dequeue", and so on. I like the
> > struct_ops approach and would love to try it if struct_ops support
> > this.
>
> The need for allowing different ".enqueue/.dequeue/..." bpf
> (BPF_PROG_TYPE_QDISC) programs loaded into different qdisc instances is because
> there is only one ".id == bpf" Qdisc_ops native kernel implementation which is
> then because of the limitation you mentioned above?
>
> Am I understanding your reason correctly on why it requires to load different
> bpf prog for different qdisc instances?
>
> If the ".enqueue/.dequeue/..." in the "struct Qdisc_ops" can be directly
> implemented in bpf prog itself, it can just load another bpf struct_ops which
> has a different ".enqueue/.dequeue/..." implementation:
>
> #> bpftool struct_ops register bpf_simple_fq_v1.bpf.o
> #> bpftool struct_ops register bpf_simple_fq_v2.bpf.o
> #> bpftool struct_ops register bpf_simple_fq_xyz.bpf.o
>
> From reading the test bpf prog, I think the set is on a good footing. Instead
> of working around the limitation by wrapping the bpf prog in a predefined
> "struct Qdisc_ops sch_bpf_qdisc_ops", lets first understand what is missing in
> bpf and see how we could address them.
>
Thank you so much for the clarification. I had the wrong impression
since I was thinking about using a structure in the bpf qdisc for
struct_ops. It makes sense to try making "struct Qdisc_ops" work with
struct_ops. I will send the next patch set with struct_ops.
Thanks,
Amery
>
Powered by blists - more mailing lists