lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2ee4f673-8218-ac72-684f-3cd78c46f750@bytedance.com>
Date:   Mon, 7 Aug 2023 11:36:22 +0800
From:   Hao Jia <jiahao.os@...edance.com>
To:     Tejun Heo <tj@...nel.org>
Cc:     lizefan.x@...edance.com, hannes@...xchg.org, mkoutny@...e.com,
        cgroups@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [External] Re: [PATCH] cgroup/rstat: record the cumulative
 per-cpu time of cgroup and its descendants



On 2023/8/3 Tejun Heo wrote:
> I couldn't come up with an answer. Let's go ahead with adding the field but
> can you please do the followings?
> 

Thank you for your suggestion. I am very willing to do it.

> * Name it to something like subtree_bstat instead of cumul_bstat. The
>    counters are all cumulative.

I did it in v2 patch.

> 
> * Are you sure the upward propagation logic is correct? It's calculating
>    global delta and then propagating to the per-cpu delta of the parent. Is
>    that correct because the two delta calculations always end up the same?

Sorry, I made a mistake and misled you. These two deltas are not always 
equal.

I found and reproduced a bug case:
We build /sys/fs/cgroup/test /sys/fs/cgroup/test/t1, 
/sys/fs/cgroup/test/t1/tt1 in turn.
And there are only tasks in /sys/fs/cgroup/test/t1/tt1. After applying 
this patch, some operations and corresponding data changes are as follows:

Step 1、cat /sys/fs/cgroup/test/t1/tt1/cpu.stat
*cpu 6* current flush cgroup /test/t1/tt1
per-cpu delta utime 0 stime 0 sum_exec_runtime 257341
/test/t1/tt1 cumul_bstat(cpu 6) utime 0 stime 0 sum_exec_runtime 257341
/test/t1/tt1 cgrp->bstat utime 0 stime 0 sum_exec_runtime 257341
if (cgroup_parent(parent)) {
     cgrp delta utime 0 stime 0 sum_exec_runtime 257341
     parent(/test/t1) ->bstat utime 0 stime 0 sum_exec_runtime 257341
     parent(/test/t1) last_bstat utime 0 stime 0 sum_exec_runtime 0
     parent(/test/t1) cumul_bstat utime 0 stime 0 sum_exec_runtime 257341
}


Step 2、cat /sys/fs/cgroup/test/t1/tt1/cpu.stat
*cpu 12* current flush cgroup /test/t1/tt1
per-cpu delta utime 0 stime 1000000 sum_exec_runtime 747042
/test/t1/tt1 cumul_bstat utime 0 stime 1000000 sum_exec_runtime 747042
/test/t1/tt1 cgrp->bstat utime 0 stime 1000000 sum_exec_runtime 1004383
if (cgroup_parent(parent)) {
     cgrp delta utime 0 stime 1000000 sum_exec_runtime 747042
     parent(/test/t1) ->bstat utime 0 stime 1000000 sum_exec_runtime 1004383
     parent(/test/t1) last_bstat utime 0 stime 0 sum_exec_runtime 0
     parent(/test/t1) cumul_bstat utime 0 stime 1000000 sum_exec_runtime 
747042
}


Step 3、cat /sys/fs/cgroup/test/cpu.stat
(cgroup fulsh /test/t1/tt1 -> /test/t1 -> /test in turn)

*cpu 6*  current flush cgroup /test/t1/tt1
per-cpu delta utime 0 stime 0 sum_exec_runtime 263468
/test/t1/tt1 cumul_bstat(cpu 6) utime 0 stime 0 sum_exec_runtime 520809
/test/t1/tt1 cgrp->bstat utime 0 stime 1000000 sum_exec_runtime 1267851
if (cgroup_parent(parent)) {
     cgrp delta utime 0 stime 0 sum_exec_runtime 263468
     parent(/test/t1) ->bstat utime 0 stime 1000000 sum_exec_runtime 1267851
     parent(/test/t1) last_bstat utime 0 stime 0 sum_exec_runtime 0
     parent(/test/t1) cumul_bstat utime 0 stime 0 sum_exec_runtime 520809
}

*cpu 6* current flush cgroup /test/t1
per-cpu delta utime 0 stime 0 sum_exec_runtime 0
/test/t1 cumul_bstat(cpu 6) utime 0 stime 0 sum_exec_runtime 520809
/test/t1 cgrp->bstat utime 0 stime 1000000 sum_exec_runtime 1267851
if (cgroup_parent(parent)) {
     cgrp delta utime 0 stime 1000000 sum_exec_runtime 1267851  <---
     parent(/test) ->bstat utime 0 stime 1000000 sum_exec_runtime 1267851
     parent(/test) last_bstat utime 0 stime 0 sum_exec_runtime 0
     parent(/test) cumul_bstat (cpu 6) utime 0 stime 1000000 
sum_exec_runtime 1267851 <--- *error*
******
   Here cgrp delta is *not equal* to per-cpu delta.
   The frequency of cgroup (/test) and its chiled cgroup (/test/t1/tt1) 
  flush is inconsistent.
   In other words (when we call cgroup_base_stat_flush(), we will not 
necessarily flush to the highest-level cgroup except root(like step 1 
and 2 above)).
   Therefore, cgrp delta may contain the cumulative value of multiple 
per-cpu deltas.

   The correct value of parent(/test) cumul_bstat should be utime 0 
stime 0 sum_exec_runtime 520809.
******
}

*cpu 6* current flush cgroup /test
per-cpu delta utime 0 stime 0 sum_exec_runtime 0
cumul_bstat utime 0 stime 1000000 sum_exec_runtime 1267851
/test ->bstat utime 0 stime 1000000 sum_exec_runtime 1267851
	cgroup_parent(parent) is NULL end.


So we should add a per-cpu variable subtree_last_bstat similar to 
cgrp->last_bstat to record the last value.

I have sent v2 patch, please review it again.
v2 link:
https://lore.kernel.org/all/20230807032930.87785-1-jiahao.os@bytedance.com


> * Please add a comment explaining that the field is not currently used
>    outside of being read from bpf / drgn and what not and that we're still
>    trying to determine how to expose that in the cgroupfs interface.


Thanks, I did it in v2 patch.

Thanks,
Hao

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ