[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2655026d-6ae4-c14c-95b0-4177eefa434f@bytedance.com>
Date: Tue, 18 Jul 2023 18:08:50 +0800
From: Hao Jia <jiahao.os@...edance.com>
To: Tejun Heo <tj@...nel.org>
Cc: lizefan.x@...edance.com, hannes@...xchg.org, mkoutny@...e.com,
cgroups@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [External] Re: [PATCH] cgroup/rstat: record the cumulative
per-cpu time of cgroup and its descendants
Hello,
On 2023/7/18 Tejun Heo wrote:
> On Mon, Jul 17, 2023 at 05:36:12PM +0800, Hao Jia wrote:
>> Now the member variable bstat of the structure cgroup_rstat_cpu
>
> You said "now" indicating that the behavior has changed recently but I don't
> see what changed there. Can you elaborate?
Thank you for your review, sorry for confusing you with my expression,
it is true that it has not changed, bstat has always recorded the
per-cpu time of cgroup itself.
>
>> records the per-cpu time of the cgroup itself, but does not
>> include the per-cpu time of its descendants. The per-cpu time
>
> It does. The per-cpu delta is added to its parent and then that will in turn
> be used to propagate to its parent.
>Yes, so only cgrp->bstat contains the cpu time of the cgroup and its
descendants. rstatc->bstat only records the per-cpu time of cgroup itself.
>
> I think you're misunderstanding how the code works. Can you please double
> check?
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -361,11 +361,15 @@ static void cgroup_base_stat_flush(struct cgroup
*cgrp, int cpu)
cgroup_base_stat_sub(&delta, &rstatc->last_bstat);
cgroup_base_stat_add(&cgrp->bstat, &delta);
cgroup_base_stat_add(&rstatc->last_bstat, &delta);
We add the current cgroup's per-cpu delta to its per-cpu variable
(cumul_bstat).
+ cgroup_base_stat_add(&rstatc->cumul_bstat, &delta);
/* propagate global delta to parent (unless that's root) */
if (cgroup_parent(parent)) {
delta = cgrp->bstat;
+ rstatc = cgroup_rstat_cpu(parent, cpu);
+
cgroup_base_stat_sub(&delta, &cgrp->last_bstat);
Since we hold the global cgroup_rstat_lock and disable interrupts in
cgroup_base_stat_flush() and we only update cgrp->bstat here.
So the global delta at this time is equal to the increment of this
cgroup and its descendants on this cpu (unless root cgroup), so we can
add the global delta to the cumul_bstat of its parent and propagate it
upward.
+ cgroup_base_stat_add(&rstatc->cumul_bstat, &delta);
cgroup_base_stat_add(&parent->bstat, &delta);
cgroup_base_stat_add(&cgrp->last_bstat, &delta);
}
I wrote a kernel module to verify and test the above kernel code,
The kernel module adds a cpu.stat_percpu_all file for each cgroup, which
can output the per-cpu time information of the cgroup and its
descendants calculated in two ways:
1. Cumulatively add the per-cpu bstat of cgroup and each of its descendants.
2. Directly output @cumul_bstat read in the kernel (patch has been applied).
When the child cgroup is not removed, the results calculated by the two
methods should be equal.
NOTE: We need to flush the data before "cat cpu.stat_percpu_all", such
as "cat cpu.stat".
Code link:
https://github.com/jiaozhouxiaojia/cgv2-stat-percpu_test/tree/main
The testing procedure is in README.
I installed the 6.5.0-rc1 kernel on my machine (an Intel Xeon(R)
Platinum 8260 machine 96 CPUs in total.) and did some tests.
So far I haven't found anything unusual, if I'm wrong, please point it
out, it's very helpful to me, thanks again!
Thanks,
Hao
Powered by blists - more mailing lists