linux-kernel - Re: [tip:sched/core] sched/cputime: Ensure accurate utime and stime ratio in cputime

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180723092118.GZ2494@hirez.programming.kicks-ass.net>
Date:   Mon, 23 Jul 2018 11:21:18 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Xunlei Pang <xlpang@...ux.alibaba.com>
Cc:     Ingo Molnar <mingo@...nel.org>, tglx@...utronix.de,
        frederic@...nel.org, lcapitulino@...hat.com,
        torvalds@...ux-foundation.org, linux-kernel@...r.kernel.org,
        hpa@...or.com, tj@...nel.org, linux-tip-commits@...r.kernel.org
Subject: Re: [tip:sched/core] sched/cputime: Ensure accurate utime and stime
 ratio in cputime_adjust()

On Tue, Jul 17, 2018 at 12:08:36PM +0800, Xunlei Pang wrote:
> The trace data corresponds to the last sample period:
> trace entry 1:
>              cat-20755 [022] d...  1370.106496: cputime_adjust: task
> tick-based utime 362560000000 stime 2551000000, scheduler rtime 333060702626
>              cat-20755 [022] d...  1370.106497: cputime_adjust: result:
> old utime 330729718142 stime 2306983867, new utime 330733635372 stime
> 2327067254
> 
> trace entry 2:
>              cat-20773 [005] d...  1371.109825: cputime_adjust: task
> tick-based utime 362567000000 stime 3547000000, scheduler rtime 334063718912
>              cat-20773 [005] d...  1371.109826: cputime_adjust: result:
> old utime 330733635372 stime 2327067254, new utime 330827229702 stime
> 3236489210
> 
> 1) expected behaviour
> Let's compare the last two trace entries(all the data below is in ns):
> task tick-based utime: 362560000000->362567000000 increased 7000000
> task tick-based stime: 2551000000  ->3547000000   increased 996000000
> scheduler rtime:       333060702626->334063718912 increased 1003016286
> 
> The application actually runs almost 100%sys at the moment, we can
> use the task tick-based utime and stime increased to double check:
> 996000000/(7000000+996000000) > 99%sys
> 
> 2) the current cputime_adjust() inaccurate result
> But for the current cputime_adjust(), we get the following adjusted
> utime and stime increase in this sample period:
> adjusted utime: 330733635372->330827229702 increased 93594330
> adjusted stime: 2327067254  ->3236489210   increased 909421956
> 
> so 909421956/(93594330+909421956)=91%sys as the shell script shows above.
> 
> 3) root cause
> The root cause of the issue is that the current cputime_adjust() always
> passes the whole times to scale_stime() to split the whole utime and
> stime. In this patch, we pass all the increased deltas in 1) within
> user's sample period to scale_stime() instead and accumulate the
> corresponding results to the previous saved adjusted utime and stime,
> so guarantee the accurate usr and sys increase within the user sample
> period.

But why it this a problem?

Since its sample based there's really nothing much you can guarantee.
What if your test program were to run in userspace for 50% of the time
but is so constructed to always be in kernel space when the tick
happens?

Then you would 'expect' it to be 50% user and 50% sys, but you're also
not getting that.

This stuff cannot be perfect, and the current code provides 'sensible'
numbers over the long run for most programs. Why muck with it?