[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+khW7j1Ni_PfvsGisUpUgFtgg=f_qEUVd1VUmocn6L3=kndhw@mail.gmail.com>
Date: Wed, 10 Aug 2022 20:10:47 -0700
From: Hao Luo <haoluo@...gle.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Andrii Nakryiko <andrii.nakryiko@...il.com>,
linux-kernel@...r.kernel.org, bpf@...r.kernel.org,
cgroups@...r.kernel.org, netdev@...r.kernel.org,
Alexei Starovoitov <ast@...nel.org>,
Andrii Nakryiko <andrii@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Martin KaFai Lau <martin.lau@...ux.dev>,
Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>,
Tejun Heo <tj@...nel.org>, Zefan Li <lizefan.x@...edance.com>,
KP Singh <kpsingh@...nel.org>,
Johannes Weiner <hannes@...xchg.org>,
Michal Hocko <mhocko@...nel.org>,
Benjamin Tissoires <benjamin.tissoires@...hat.com>,
John Fastabend <john.fastabend@...il.com>,
Michal Koutny <mkoutny@...e.com>,
Roman Gushchin <roman.gushchin@...ux.dev>,
David Rientjes <rientjes@...gle.com>,
Stanislav Fomichev <sdf@...gle.com>,
Shakeel Butt <shakeelb@...gle.com>,
Yosry Ahmed <yosryahmed@...gle.com>
Subject: Re: [PATCH bpf-next v7 4/8] bpf: Introduce cgroup iter
On Tue, Aug 9, 2022 at 11:38 AM Hao Luo <haoluo@...gle.com> wrote:
>
> On Tue, Aug 9, 2022 at 9:23 AM Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> >
> > On Mon, Aug 08, 2022 at 05:56:57PM -0700, Hao Luo wrote:
> > > On Mon, Aug 8, 2022 at 5:19 PM Andrii Nakryiko
> > > <andrii.nakryiko@...il.com> wrote:
> > > >
> > > > On Fri, Aug 5, 2022 at 2:49 PM Hao Luo <haoluo@...gle.com> wrote:
> > > > >
> > > > > Cgroup_iter is a type of bpf_iter. It walks over cgroups in four modes:
> > > > >
> > > > > - walking a cgroup's descendants in pre-order.
> > > > > - walking a cgroup's descendants in post-order.
> > > > > - walking a cgroup's ancestors.
> > > > > - process only the given cgroup.
> > > > >
> [...]
> > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > index 59a217ca2dfd..4d758b2e70d6 100644
> > > > > --- a/include/uapi/linux/bpf.h
> > > > > +++ b/include/uapi/linux/bpf.h
> > > > > @@ -87,10 +87,37 @@ struct bpf_cgroup_storage_key {
> > > > > __u32 attach_type; /* program attach type (enum bpf_attach_type) */
> > > > > };
> > > > >
> > > > > +enum bpf_iter_order {
> > > > > + BPF_ITER_ORDER_DEFAULT = 0, /* default order. */
> > > >
> > > > why is this default order necessary? It just adds confusion (I had to
> > > > look up source code to know what is default order). I might have
> > > > missed some discussion, so if there is some very good reason, then
> > > > please document this in commit message. But I'd rather not do some
> > > > magical default order instead. We can set 0 to mean invalid and error
> > > > out, or just do SELF as the very first value (and if user forgot to
> > > > specify more fancy mode, they hopefully will quickly discover this in
> > > > their testing).
> > > >
> > >
> > > PRE/POST/UP are tree-specific orders. SELF applies on all iters and
> > > yields only a single object. How does task_iter express a non-self
> > > order? By non-self, I mean something like "I don't care about the
> > > order, just scan _all_ the objects". And this "don't care" order, IMO,
> > > may be the common case. I don't think everyone cares about walking
> > > order for tasks. The DEFAULT is intentionally put at the first value,
> > > so that if users don't care about order, they don't have to specify
> > > this field.
> > >
> > > If that sounds valid, maybe using "UNSPEC" instead of "DEFAULT" is better?
> >
> > I agree with Andrii.
> > This:
> > + if (order == BPF_ITER_ORDER_DEFAULT)
> > + order = BPF_ITER_DESCENDANTS_PRE;
> >
> > looks like an arbitrary choice.
> > imo
> > BPF_ITER_DESCENDANTS_PRE = 0,
> > would have been more obvious. No need to dig into definition of "default".
> >
> > UNSPEC = 0
> > is fine too if we want user to always be conscious about the order
> > and the kernel will error if that field is not initialized.
> > That would be my preference, since it will match the rest of uapi/bpf.h
> >
>
> Sounds good. In the next version, will use
>
> enum bpf_iter_order {
> BPF_ITER_ORDER_UNSPEC = 0,
> BPF_ITER_SELF_ONLY, /* process only a single object. */
> BPF_ITER_DESCENDANTS_PRE, /* walk descendants in pre-order. */
> BPF_ITER_DESCENDANTS_POST, /* walk descendants in post-order. */
> BPF_ITER_ANCESTORS_UP, /* walk ancestors upward. */
> };
>
Sigh, I find that having UNSPEC=0 and erroring out when seeing UNSPEC
doesn't work. Basically, if we have a non-iter prog and a cgroup_iter
prog written in the same source file, I can't use
bpf_object__attach_skeleton to attach them. Because the default
prog_attach_fn for iter initializes `order` to 0 (that is, UNSPEC),
which is going to be rejected by the kernel. In order to make
bpf_object__attach_skeleton work on cgroup_iter, I think I need to use
the following
enum bpf_iter_order {
BPF_ITER_DESCENDANTS_PRE, /* walk descendants in pre-order. */
BPF_ITER_DESCENDANTS_POST, /* walk descendants in post-order. */
BPF_ITER_ANCESTORS_UP, /* walk ancestors upward. */
BPF_ITER_SELF_ONLY, /* process only a single object. */
};
So that when calling bpf_object__attach_skeleton() on cgroup_iter, a
link can be generated and the generated link defaults to pre-order
walk on the whole hierarchy. Is there a better solution?
> and explicitly list the values acceptable by cgroup_iter, error out if
> UNSPEC is detected.
>
> Also, following Andrii's comments, will change BPF_ITER_SELF to
> BPF_ITER_SELF_ONLY, which does seem a little bit explicit in
> comparison.
>
> > I applied the first 3 patches to ease respin.
>
> Thanks! This helps!
>
> > Thanks!
Powered by blists - more mailing lists