linux-kernel - Re: [PATCH] sched: fix sched_entity avg statistics update

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPM31RLjHnn=Ma7peuDUqK75bANw7Mb-UT-1QmzG0c8LAObQ8g@mail.gmail.com>
Date:	Tue, 21 Jan 2014 12:31:18 -0800
From:	Paul Turner <pjt@...gle.com>
To:	Vincent Guittot <vincent.guittot@...aro.org>
Cc:	Ben Segall <bsegall@...gle.com>, Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Linaro Kernel <linaro-kernel@...ts.linaro.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] sched: fix sched_entity avg statistics update

On Tue, Jan 21, 2014 at 12:00 PM, Vincent Guittot
<vincent.guittot@...aro.org> wrote:
>
> Le 21 janv. 2014 19:39, <bsegall@...gle.com> a écrit :
>
>
>>
>> Vincent Guittot <vincent.guittot@...aro.org> writes:
>>
>> > With the current implementation, the load average statistics of a sched
>> > entity
>> > change according to other activity on the CPU even if this activity is
>> > done
>> > between the running window of the sched entity and have no influence on
>> > the
>> > running duration of the task.
>> >
>> > When a task wakes up on the same CPU, we currently update
>> > last_runnable_update
>> > with the return  of __synchronize_entity_decay without updating the
>> > runnable_avg_sum and runnable_avg_period accordingly. In fact, we have
>> > to sync
>> > the load_contrib of the se with the rq's blocked_load_contrib before
>> > removing
>> > it from the latter (with __synchronize_entity_decay) but we must keep
>> > last_runnable_update unchanged for updating runnable_avg_sum/period
>> > during the
>> > next update_entity_load_avg.
>>
>> ... Gah, that's correct, we had this right the first time. Could you do
>> this as a full revert of 282cf499f03ec1754b6c8c945c9674b02631fb0f (ie
>> remove the now inaccurate comment, or maybe replace it with a correct
>> one).
>
> Ok i'm going to remove comment as well and replace it with a new description
>

I think I need to go through and do a comments patch like we did with
the wake-affine math; it's too easy to make a finicky mistake like
this when not touching this path for a while.

OK, so there are two numerical components we're juggling here:

1) The actual quotient for the current runnable average, stored as
(runnable_avg_sum / runnable_avg period).  Last updated at
last_runnable_update.
2) The last time we computed the quotient in (1) and accumulated it in
within cfs_rq->{runnable, blocked}_load_avg, this is stored in
load_avg_contrib.  We track the passage of off-rq time and migrations
against this value using decay_count.
[ All of the values above are stored on se / se->avg ]

When we are re-enqueuing something and we wish to remove its
contribution from blocked_load_avg, we must update load_avg_contrib in
(2) using the total time it spent off rq (using a jiffy rounded
approximation in decay_count).  However, Alex's patch (which this
reverts) also adjusted the quotient by modifying its last update time
so as to make it look up-to-date, effectively skipping the most recent
idle span.

I think we could make the connection between (1) and (2) more explicit
if we moved the subsequent "if (wakeup) logic" within the else.  We
can then have a comment that refers to (1) and (2) explicitly, perhaps
something like:

Task re-woke on same cpu (or else migrate_task_rq_fair() would have
made count negative).  Perform an approximate decay on
load_avg_contrib to match blocked_load_avg, and compute a precise
runnable_avg_sum quotient update that will be accumulated into
runnable_load_avg below.

>
>> >
>> > Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
>> > ---
>> >  kernel/sched/fair.c |    3 +--
>> >  1 file changed, 1 insertion(+), 2 deletions(-)
>> >
>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> > index e64b079..5b0ef90 100644
>> > --- a/kernel/sched/fair.c
>> > +++ b/kernel/sched/fair.c
>> > @@ -2370,8 +2370,7 @@ static inline void enqueue_entity_load_avg(struct
>> > cfs_rq *cfs_rq,
>> >                * would have made count negative); we must be careful to
>> > avoid
>> >                * double-accounting blocked time after synchronizing
>> > decays.
>> >                */
>> > -             se->avg.last_runnable_update +=
>> > __synchronize_entity_decay(se)
>> > -                                                     << 20;
>> > +             __synchronize_entity_decay(se);
>> >       }
>> >
>> >       /* migrated tasks did not contribute to our blocked load */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/