linux-kernel - Re: [PATCH] sched/cputime: make scale

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190722195605.GI6698@worktop.programming.kicks-ass.net>
Date:   Mon, 22 Jul 2019 21:56:05 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Oleg Nesterov <oleg@...hat.com>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Andrew Fox <afox@...hat.com>,
        Stephen Johnston <sjohnsto@...hat.com>,
        linux-kernel@...r.kernel.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Stanislaw Gruszka <sgruszka@...hat.com>
Subject: Re: [PATCH] sched/cputime: make scale_stime() more precise

On Fri, Jul 19, 2019 at 04:37:42PM +0200, Oleg Nesterov wrote:
> On 07/19, Peter Zijlstra wrote:

> > But I'm still confused, since in the long run, it should still end up
> > with a proportionally divided user/system, irrespective of some short
> > term wobblies.
> 
> Why?
> 
> Yes, statistically the numbers are proportionally divided.

This; due to the loss in precision the distribution is like a step
function around the actual s:u ratio line, but on average it still is
s:u.

Even if it were a perfect function, we'd still see increments in stime even
if the current program state never does syscalls, simply because it
needs to stay on that s:u line.

> but you will (probably) never see the real stime == 1000 && utime == 10000
> numbers if you watch incrementally.

See, there are no 'real' stime and utime numbers. What we have are user
and system samples -- tick based.

If the tick lands in the kernel, we get a system sample, if the tick
lands in userspace we get a user sample.

What we do have is an accurate (ns) based runtime accounting, and we
(re)construct stime and utime from this; we divide the total known
runtime in stime and utime pro-rata.

Sure, we take a shortcut, it wobbles a bit, but seriously, the samples
are inaccurate anyway, so who bloody cares :-)

You can construct a program that runs 99% in userspace but has all
system samples. All you need to do is make sure you're in a system call
when the tick lands.

> Just in case... yes I know that these numbers can only "converge" to the
> reality, only their sum is correct. But people complain.

People always complain, just tell em to go pound sand :-)