[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190722200034.GJ6698@worktop.programming.kicks-ass.net>
Date: Mon, 22 Jul 2019 22:00:34 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Stanislaw Gruszka <sgruszka@...hat.com>
Cc: Oleg Nesterov <oleg@...hat.com>, Ingo Molnar <mingo@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Andrew Fox <afox@...hat.com>,
Stephen Johnston <sjohnsto@...hat.com>,
linux-kernel@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH] sched/cputime: make scale_stime() more precise
On Mon, Jul 22, 2019 at 12:52:41PM +0200, Stanislaw Gruszka wrote:
> On Fri, Jul 19, 2019 at 01:03:49PM +0200, Peter Zijlstra wrote:
> > > shows the problem even when sum_exec_runtime is not that big: 300000 secs.
> > >
> > > The new implementation of scale_stime() does the additional div64_u64_rem()
> > > in a loop but see the comment, as long it is used by cputime_adjust() this
> > > can happen only once.
> >
> > That only shows something after long long staring :/ There's no words on
> > what the output actually means or what would've been expected.
> >
> > Also, your example is incomplete; the below is a test for scale_stime();
> > from this we can see that the division results in too large a number,
> > but, important for our use-case in cputime_adjust(), it is a step
> > function (due to loss in precision) and for every plateau we shift
> > runtime into the wrong bucket.
> >
> > Your proposed function works; but is atrocious, esp. on 32bit. That
> > said, before we 'fixed' it, it had similar horrible divisions in, see
> > commit 55eaa7c1f511 ("sched: Avoid cputime scaling overflow").
> >
> > Included below is also an x86_64 implementation in 2 instructions.
> >
> > I'm still trying see if there's anything saner we can do...
>
> I was always proponent of removing scaling and export raw values
> and sum_exec_runtime. But that has obvious drawback, reintroduce
> 'top hiding' issue.
I think (but didn't grep) that we actually export sum_exec_runtime in
/proc/ *somewhere*.
> But maybe we can export raw values in separate file i.e.
> /proc/[pid]/raw_cpu_times ? So applications that require more precise
> cputime values for very long-living processes can use this file.
There are no raw cpu_times, there are system and user samples, and
samples are, by their very nature, an approximation. We just happen to
track the samples in TICK_NSEC resolution these days, but they're still
ticks (except on s390 and maybe other archs, which do time accounting in
the syscall path).
But I think you'll find x86 people are quite opposed to doing TSC reads
in syscall entry and exit :-)
Powered by blists - more mailing lists