[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YnrEDfZs1kuB1gu5@slm.duckdns.org>
Date: Tue, 10 May 2022 09:59:09 -1000
From: Tejun Heo <tj@...nel.org>
To: Yosry Ahmed <yosryahmed@...gle.com>
Cc: Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>, Hao Luo <haoluo@...gle.com>,
Zefan Li <lizefan.x@...edance.com>,
Johannes Weiner <hannes@...xchg.org>,
Shuah Khan <shuah@...nel.org>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Michal Hocko <mhocko@...nel.org>,
Stanislav Fomichev <sdf@...gle.com>,
David Rientjes <rientjes@...gle.com>,
Greg Thelen <gthelen@...gle.com>,
Shakeel Butt <shakeelb@...gle.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Networking <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
cgroups@...r.kernel.org
Subject: Re: [RFC PATCH bpf-next 1/9] bpf: introduce CGROUP_SUBSYS_RSTAT
program type
Hello,
On Tue, May 10, 2022 at 12:34:42PM -0700, Yosry Ahmed wrote:
> The rationale behind associating this work with cgroup_subsys is that
> usually the stats are associated with a resource (e.g. memory, cpu,
> etc). For example, if the memory controller is only enabled for a
> subtree in a big hierarchy, it would be more efficient to only run BPF
> rstat programs for those cgroups, not the entire hierarchy. It
> provides a way to control what part of the hierarchy you want to
> collect stats for. This is also semantically similar to the
> css_rstat_flush() callback.
Hmm... one major point of rstat is not having to worry about these things
because we iterate what's been active rather than what exists. Now, this
isn't entirely true because we share the same updated list for all sources.
This is a trade-off which makes sense because 1. the number of cgroups to
iterate each cycle is generally really low anyway 2. different controllers
often get enabled together. If the balance tilts towards "we're walking too
many due to the sharing of updated list across different sources", the
solution would be splitting the updated list so that we make the walk finer
grained.
Note that the above doesn't really affect the conceptual model. It's purely
an optimization decision. Tying these things to a cgroup_subsys does affect
the conceptual model and, in this case, the userland API for a performance
consideration which can be solved otherwise.
So, let's please keep this simple and in the (unlikely) case that the
overhead becomes an issue, solve it from rstat operation side.
Thanks.
--
tejun
Powered by blists - more mailing lists