linux-kernel - Re: [RFC][PATCH] sched: attach extra runtime to the right avg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170704124754.GA6807@destiny>
Date:   Tue, 4 Jul 2017 08:47:55 -0400
From:   Josef Bacik <josef@...icpanda.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...nel.org>, josef@...icpanda.com,
        mingo@...hat.com, linux-kernel@...r.kernel.org, kernel-team@...com,
        Josef Bacik <jbacik@...com>
Subject: Re: [RFC][PATCH] sched: attach extra runtime to the right avg

On Tue, Jul 04, 2017 at 02:40:03PM +0200, Peter Zijlstra wrote:
> On Tue, Jul 04, 2017 at 02:21:50PM +0200, Peter Zijlstra wrote:
> > On Tue, Jul 04, 2017 at 12:13:09PM +0200, Ingo Molnar wrote:
> > > 
> > > This code on the other hand:
> > > 
> > > 	sa->last_update_time += delta << 10;
> > > 
> > > ... in essence creates a whole new absolute clock value that slowly but surely is 
> > > drifting away from the real rq->clock, because 'delta' is always rounded down to 
> > > the nearest 1024 ns boundary, so we accumulate the 'remainder' losses.
> > > 
> > > That is because:
> > > 
> > >         delta >>= 10;
> > > 	...
> > >         sa->last_update_time += delta << 10;
> > > 
> > > Given enough time, ->last_update_time can drift a long way, and this delta:
> > > 
> > > 	delta = now - sa->last_update_time;
> > > 
> > > ... becomes meaningless AFAICS, because it's essentially two different clocks that 
> > > get compared.
> > 
> > Thing is, once you drift over 1023 (ns) your delta increases and you
> > catch up again.
> > 
> > 
> > 
> >  A  B     C       D          E  F
> >  |  |     |       |          |  |
> >  +----+----+----+----+----+----+----+----+----+----+----+
> > 
> > 
> > A: now = 0
> >    sa->last_update_time = 0
> >    delta := (now - sa->last_update_time) >> 10 = 0
> > 
> > B: now = 614				(+614)
> >    delta = (614 - 0) >> 10 = 0
> >    sa->last_update_time += 0		(0)
> >    sa->last_update_time = now & ~1023	(0)
> > 
> > C: now = 1843				(+1229)
> >    delta = (1843 - 0) >> 10 = 1
> >    sa->last_update_time += 1024		(1024)
> >    sa->last_update_time = now & ~1023	(1024)
> > 
> > 
> > D: now = 3481				(+1638)
> >    delta = (3481 - 1024) >> 10 = 2
> >    sa->last_update_time += 2048		(3072)
> >    sa->last_update_time = now & ~1023	(3072)
> > 
> > E: now = 5734				(+2253)
> >    delta = (5734 - 3072) = 2
> >    sa->last_update_time += 2048		(5120)
> >    sa->last_update_time = now & ~1023	(5120)
> > 
> > F: now = 6348				(+614)
> >    delta = (6348 - 5120) >> 10 = 1
> >    sa->last_update_time += 1024		(6144)
> >    sa->last_update_time = now & ~1023	(6144)
> > 
> > 
> > 
> > And you'll see that both are identical, and that both D and F have
> > gotten a spill from sub-chunk accounting.
> 
> 
> Where the two approaches differ is when we have different modifications
> to sa->last_update_time (and we do).
> 
> The differential (+=) one does not mandate initial value of
> ->last_update_time has the bottom 9 bits cleared. It will simply
> continue from wherever.
> 
> The absolute (&) one however mandates that ->last_update_time always has
> the bottom few bits 0, otherwise we can 'gain' time. The first iteration
> will clear those bits and we'll then double account them.
> 
> It so happens that we have an explicit assign in migrate
> (attach_entity_load_avg / set_task_rq_fair). And on negative delta. In
> all those cases we use the immediate 'now' value, no clearing of bottom
> bits.
> 
> The differential should work fine with that, the absolute one has double
> accounting issues in that case.
> 
> So it would be very good to find what exactly causes Josef's workload to
> get 'fixed'.

Sorry let me experiment some more, like I said this is one of 4 patches I need
to actually fix my workload, and I've tested so many iterations of this problem
that I may be _thinking_ it affects things but it really doesn't.  I'll re-test
with the normal code and the other 3 patches in place and see if things are ok.
Thanks,

Josef