linux-kernel - Re: [PATCH 1/1] cputime: Make the reported utime+stime correspond to the actual runtime.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKdL+dT5dm4fzhiXUiOMEn-wuSef4a6S6kpNT6StXQ-yvkXzgQ@mail.gmail.com>
Date:	Tue, 16 Jun 2015 16:35:21 +0200
From:	Fredrik Markström <fredrik.markstrom@...il.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	mingo@...hat.com, linux-kernel@...r.kernel.org,
	Rik van Riel <riel@...hat.com>
Subject: Re: [PATCH 1/1] cputime: Make the reported utime+stime correspond to
 the actual runtime.

I cleaned up the test application a bit, in case it helps someone
understand the problem and test other potential fixes. It's
at: https://gist.github.com/frma71/5af5a2a4b264d5cdd265

I basically copied the cputime_adjust() code out of the kernel and
added some stubs to be able to compile and run it
as an ordinary user mode application.

To download, compile and run it:

% wget https://goo.gl/PE6RSj -O testadjust.c
% gcc -W -Wall -o testadjust testadjust.c
% ./testadjust

My questions right now are:

1) Is this a problem worth fixing (should the reported sys+user =
sum_exec_runtime) ?
2) Is there a preferred solution to the global spinlock ?

/Fredrik


On Mon, Jun 15, 2015 at 5:34 PM, Fredrik Markström
<fredrik.markstrom@...il.com> wrote:
> Hello Peter, your patch helps with some of the cases but not all:
>
> (the "called with.." below means cputime_adjust() is called with the
> values specified in it's struct task_cputime argument.)
>
> It helps when called with:
>
> sum_exec_runtime=1000000000 utime=0 stime=1
> ... followed by...
> sum_exec_runtime=1010000000 utime=100 stime=1
>
> It doesn't help when called with:
>
> sum_exec_runtime=1000000000 utime=1 stime=0
> ... followed by...
> sum_exec_runtime=1010000000 utime=1 stime=100
>
> Also if we get a call with:
>
> sum_exec_runtime=1000000000 utime=1 stime=1
>
> ...  then get preempted after your proposed fix and before we are done
> with the calls to cpu_advance(), then gets called again (from a
> different thread) with:
>
> sum_exec_runtime=1010000000 utime=100 stime=1
>
> ... it still breaks.
>
> I think there might be additional concurrency problems before, between
> and/or possibly after the calls to cputime_advance(), at least if we
> want to guarantee that sys+user should stay sane. I believe my
> proposed patch eliminates those potential problems in a pretty
> straight forward way.
>
> I tried to come up with a lock free solution but didn't find a simple
> solution. Since, from what I understand, the likelihood of scalability
> issues here are unlikely I felt that simplicity was preferred. Also
> the current implementation has two cmpxchg:s, and my proposal a single
> spinlock, so on some setups I bet it's more efficient (like mine with
> a lousy interconnect and preempt-rt (but I'm on thin ice here)).
>
> Below is the output from my test application (it's to much of a hack
> to post publicly), but I'd be happy to clean it up and post it if
> necessary.
>
> /Fredrik
>
>
> #<test>.<step> <input> =>  <test>.<step> <output> [=====> FAILED]
>
> 0.0 sum_exec=100000000000 utime=0     stime=1     =>    0.0 tot=10000
>     user=0     sys=10000
> 0.1 sum_exec=101000000000 utime=100 stime=1     =>    0.1 tot=10100
>   user=100   sys=10000
>
> 1.0 sum_exec=100000000000 utime=1     stime=0     =>    1.0 tot=10000
>     user=10000 sys=0
> 1.1 sum_exec=101000000000 utime=1     stime=100 =>    1.1 tot=20000
>   user=10000 sys=10000 =====> FAILED
>
> 2.0 sum_exec=100000000000 utime=1     stime=1     =>    2.0 tot=10000
>     user=5000  sys=5000
> 2.1 sum_exec=101000000000 utime=100 stime=1     =>    2.1 tot=10100
>   user=5100  sys=5000
>
> 3.0 sum_exec=100000000000 utime=1     stime=1     => <<PREEMPT>>
>     3.1 sum_exec=101000000000 utime=100 stime=1     =>    3.1
> tot=10100      user=10000 sys=100
> <<SWITCH BACK>>   3.0 tot=15000      user=10000 sys=5000  =====> FAILED
>
>
> On Fri, Jun 12, 2015 at 1:01 PM, Peter Zijlstra <peterz@...radead.org> wrote:
>> On Fri, Jun 12, 2015 at 12:16:57PM +0200, Peter Zijlstra wrote:
>>> On Fri, 2015-06-12 at 10:55 +0200, Fredrik Markstrom wrote:
>>> > The scaling mechanism might sometimes cause top to report >100%
>>> > (sometimes > 1000%) cpu usage for a single thread. This patch makes
>>> > sure that stime+utime corresponds to the actual runtime of the thread.
>>>
>>> This Changelog is inadequate, it does not explain the actual problem.
>>>
>>> > +static DEFINE_SPINLOCK(prev_time_lock);
>>>
>>> global (spin)locks are bad.
>>
>> Since you have a proglet handy to test this; does something like the
>> below help anything?
>>
>> ---
>>  kernel/sched/cputime.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
>> index f5a64ffad176..3d3f60a555a0 100644
>> --- a/kernel/sched/cputime.c
>> +++ b/kernel/sched/cputime.c
>> @@ -613,6 +613,10 @@ static void cputime_adjust(struct task_cputime *curr,
>>
>>                 stime = scale_stime((__force u64)stime,
>>                                     (__force u64)rtime, (__force u64)total);
>> +
>> +               if (stime < prev->stime)
>> +                       stime = prev->stime;
>> +
>>                 utime = rtime - stime;
>>         }
>>
>
>
>
> --
> /Fredrik



-- 
/Fredrik
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/