lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 4 Jul 2017 14:40:03 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Ingo Molnar <mingo@...nel.org>
Cc:     josef@...icpanda.com, mingo@...hat.com,
        linux-kernel@...r.kernel.org, kernel-team@...com,
        Josef Bacik <jbacik@...com>
Subject: Re: [RFC][PATCH] sched: attach extra runtime to the right avg

On Tue, Jul 04, 2017 at 02:21:50PM +0200, Peter Zijlstra wrote:
> On Tue, Jul 04, 2017 at 12:13:09PM +0200, Ingo Molnar wrote:
> > 
> > This code on the other hand:
> > 
> > 	sa->last_update_time += delta << 10;
> > 
> > ... in essence creates a whole new absolute clock value that slowly but surely is 
> > drifting away from the real rq->clock, because 'delta' is always rounded down to 
> > the nearest 1024 ns boundary, so we accumulate the 'remainder' losses.
> > 
> > That is because:
> > 
> >         delta >>= 10;
> > 	...
> >         sa->last_update_time += delta << 10;
> > 
> > Given enough time, ->last_update_time can drift a long way, and this delta:
> > 
> > 	delta = now - sa->last_update_time;
> > 
> > ... becomes meaningless AFAICS, because it's essentially two different clocks that 
> > get compared.
> 
> Thing is, once you drift over 1023 (ns) your delta increases and you
> catch up again.
> 
> 
> 
>  A  B     C       D          E  F
>  |  |     |       |          |  |
>  +----+----+----+----+----+----+----+----+----+----+----+
> 
> 
> A: now = 0
>    sa->last_update_time = 0
>    delta := (now - sa->last_update_time) >> 10 = 0
> 
> B: now = 614				(+614)
>    delta = (614 - 0) >> 10 = 0
>    sa->last_update_time += 0		(0)
>    sa->last_update_time = now & ~1023	(0)
> 
> C: now = 1843				(+1229)
>    delta = (1843 - 0) >> 10 = 1
>    sa->last_update_time += 1024		(1024)
>    sa->last_update_time = now & ~1023	(1024)
> 
> 
> D: now = 3481				(+1638)
>    delta = (3481 - 1024) >> 10 = 2
>    sa->last_update_time += 2048		(3072)
>    sa->last_update_time = now & ~1023	(3072)
> 
> E: now = 5734				(+2253)
>    delta = (5734 - 3072) = 2
>    sa->last_update_time += 2048		(5120)
>    sa->last_update_time = now & ~1023	(5120)
> 
> F: now = 6348				(+614)
>    delta = (6348 - 5120) >> 10 = 1
>    sa->last_update_time += 1024		(6144)
>    sa->last_update_time = now & ~1023	(6144)
> 
> 
> 
> And you'll see that both are identical, and that both D and F have
> gotten a spill from sub-chunk accounting.


Where the two approaches differ is when we have different modifications
to sa->last_update_time (and we do).

The differential (+=) one does not mandate initial value of
->last_update_time has the bottom 9 bits cleared. It will simply
continue from wherever.

The absolute (&) one however mandates that ->last_update_time always has
the bottom few bits 0, otherwise we can 'gain' time. The first iteration
will clear those bits and we'll then double account them.

It so happens that we have an explicit assign in migrate
(attach_entity_load_avg / set_task_rq_fair). And on negative delta. In
all those cases we use the immediate 'now' value, no clearing of bottom
bits.

The differential should work fine with that, the absolute one has double
accounting issues in that case.

So it would be very good to find what exactly causes Josef's workload to
get 'fixed'.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ