lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b9363fa9-826f-611f-3ab3-27e50c03422a@linux.alibaba.com>
Date:   Tue, 26 Jun 2018 20:19:49 +0800
From:   Xunlei Pang <xlpang@...ux.alibaba.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Frederic Weisbecker <frederic@...nel.org>,
        Tejun Heo <tj@...nel.org>
Cc:     linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

On 6/22/18 3:15 PM, Xunlei Pang wrote:
> We use per-cgroup cpu usage statistics similar to "cgroup rstat",
> and encountered a problem that user and sys usages are wrongly
> split sometimes.
> 
> Run tasks with some random run-sleep pattern for a long time, and
> when tick-based time and scheduler sum_exec_runtime hugely drifts
> apart(scheduler sum_exec_runtime is less than tick-based time),
> the current implementation of cputime_adjust() will produce less
> sys usage than the actual use after changing to run a different
> workload pattern with high sys. This is because total tick-based
> utime and stime are used to split the total sum_exec_runtime.
> 
> Same problem exists on utime and stime from "/proc/<pid>/stat".
> 
> [Example]
> Run some random run-sleep patterns for minutes, then change to run
> high sys pattern, and watch.
> 1) standard "top"(which is the correct one):
>    4.6 us, 94.5 sy,  0.0 ni,  0.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> 2) our tool parsing utime and stime from "/proc/<pid>/stat":
>    20.5 usr, 78.4 sys
> We can see "20.5 usr" displayed in 2) was incorrect, it recovers
> gradually with time: 9.7 usr, 89.5 sys
> 
High sys probably means there's something abnormal on the kernel
path, it may hide issues, so we should make it fairly reliable.
It can easily hit this problem with our per-cgroup statistics.

Hi Peter, any comment on this patch?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ