linux-kernel - Re: [PATCH] sched/fair: Do not decay new task load on first enqueue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160928111912.GU5016@twins.programming.kicks-ass.net>
Date:   Wed, 28 Sep 2016 13:19:12 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     Matt Fleming <matt@...eblueprint.co.uk>,
        Ingo Molnar <mingo@...nel.org>, linux-kernel@...r.kernel.org,
        Mike Galbraith <umgwanakikbuti@...il.com>,
        Yuyang Du <yuyang.du@...el.com>,
        Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: [PATCH] sched/fair: Do not decay new task load on first enqueue

On Wed, Sep 28, 2016 at 12:06:43PM +0100, Dietmar Eggemann wrote:
> On 28/09/16 11:14, Peter Zijlstra wrote:
> > On Fri, Sep 23, 2016 at 12:58:08PM +0100, Matt Fleming wrote:
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index 8fb4d1942c14..4a2d3ff772f8 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -3142,7 +3142,7 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> >>  	int migrated, decayed;
> >>  
> >>  	migrated = !sa->last_update_time;
> >> -	if (!migrated) {
> >> +	if (!migrated && se->sum_exec_runtime) {
> >>  		__update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
> >>  			se->on_rq * scale_load_down(se->load.weight),
> >>  			cfs_rq->curr == se, NULL);
> > 
> > 
> > Hrmm,.. so I see the problem, but I think we're working around it.
> > 
> > So the problem is that time moves between wake_up_new_task() doing
> > post_init_entity_util_avg(), which attaches us to the cfs_rq, and
> > activate_task() which enqueues us.
> > 
> > Part of the problem is that we do not in fact seem to do
> > update_rq_clock() before post_init_entity_util_avg(), which makes the
> > delta larger than it should be.
> 
> Yes, this is what I see as well. I always thought that the update is
> done in task_fork_fair() so it's bounded but as I know now, this update
> is only for the waker. In case the cpu was idle before the delta can be
> pretty big.
> 
> > The other problem is that activate_task()->enqueue_task() does do
> > update_rq_clock() (again, after fixing), creating the delta.
> 
> Not sure what you mean by 'after fixing' but the se is initialized with
> a possibly stale 'now' value in post_init_entity_util_avg()->
> attach_entity_load_avg() before the clock is updated in
> activate_task()->enqueue_task().

I meant that after I fix the above issue of calling post_init with a
stale clock. So the + update_rq_clock(rq) in the patch.

> > Which suggests we do something like the below (not compile tested or
> > anything, also I ran out of tea again).
> 
> I'll give it a try. Plenty of coffee here ...
> 
> > 
> > While staring at this, I don't think we can still hit
> > vruntime_normalized() with a new task, so I _think_ we can remove that
> > !se->sum_exec_runtime clause there (and rejoice), no?
> 
> I'm afraid that with accurate timing we will get the same situation that
> we add and subtract the same amount of load (probably 1024 now and not
> 1002 (or less)) to/from cfs_rq->runnable_load_avg for the initial (fork)
> hackbench run.
> After all, it's 'runnable' based.

The idea was that since we now update rq clock before post_init and then
leave it be, both post_init and enqueue see the exact same timestamp,
and the delta is 0, resulting in no aging.

Or did I fail to make that happen?