linux-kernel - Re: [PATCH 4/4] sched/fair: Implement flat hierarchical structure for util

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtDVRy=RCJaLBa2_KVx2tk3SsuUPDhDLcovf+kpT-artbQ@mail.gmail.com>
Date:	Wed, 13 Apr 2016 13:27:21 +0200
From:	Vincent Guittot <vincent.guittot@...aro.org>
To:	Yuyang Du <yuyang.du@...el.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Benjamin Segall <bsegall@...gle.com>,
	Paul Turner <pjt@...gle.com>,
	Morten Rasmussen <morten.rasmussen@....com>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Juri Lelli <juri.lelli@....com>
Subject: Re: [PATCH 4/4] sched/fair: Implement flat hierarchical structure for util_avg

On 11 April 2016 at 22:37, Yuyang Du <yuyang.du@...el.com> wrote:
> Hi Vincent,
>
> On Mon, Apr 11, 2016 at 02:29:25PM +0200, Vincent Guittot wrote:
>>
>> > update any group entity's util, so the net overhead should not be
>> > formidable. In addition, rq's util updates can be very frequent,
>> > but the updates in a period will be skipped, this is mostly effective
>> > in update_blocked_averages().
>>
>> Not sure to catch your point here but the rq->util of an idle CPU will
>> never be updated before a (idle) load balance as we call
>> update_cfs_rq_load_avg which doesn't update the cfs_rq->avg.util_avg
>> anymore nor rq->avg.util_avg.
>
> I meant in update_blocked_averages(), the rq's util won't be updated
> every time a cfs is updated if they are in the same period (~1ms), probabaly
> this is the case.
>
> The idle CPU thing is no difference, so it is anyway the responsibility of
> any other active CPU to take the idle CPU's idle time into account for any
> purpose.
>
>>
>> > +       if (update_util)
>> > +               sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
>> > +
>> > +       if (!cfs_rq)
>> > +               return 1;
>> > +
>> > +       /*
>> > +        * Update rq's util_sum and util_avg
>> > +        */
>>
>> Do we really have to update the utilization of rq each time we update
>> a sched_entity ?  IMHO, we don't need to do this update so often even
>> more if the se is a task_group. But even if it's a task, we don't
>> especially need to update it right now but we can wait for the update
>> of the rq->cfs like previously or we explicilty update it when we have
>> to do a direct sub/add on the util_avg value
>> See also my comment on removed_util_avg below
>>
>
> No, we only update a rq's util if the update is for cfs, not the sched_entity,
> which is the if (!cfs_rq) condition's job

ah yes i have skiped the ! when reading

>
>> Why not using the sched_avg of the rq->cfs in order to track the
>> utilization of the root cfs_rq instead of adding a new sched_avg into
>> the rq ? Then you call update_cfs_rq_load_avg(rq->cfs) when you want
>> to update/sync the utilization of the rq->cfs and for one call you
>> will update both the load_avg and the util_avg of the root cfs instead
>> of duplicating the sequence in _update_load_avg
>
> This is the approach taken by Dietmar in his patch, a fairly easy approach.
> The problem is though, if so, we update the root cfs_rq only when it is
> the root cfs_rq to update. A simple contrived case will make it never
> updated except in update_blocked_averages(). My impression is that this
> might be too much precision lost.
>
> And thus we take this alternative approach, and thus I revisited
> __update_load_avg() to optimize it.
>
> [snip]
>
>> > -       if (atomic_long_read(&cfs_rq->removed_util_avg)) {
>> > -               long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0);
>> > -               sa->util_avg = max_t(long, sa->util_avg - r, 0);
>> > -               sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0);
>> > +       if (atomic_long_read(&rq->removed_util_avg)) {
>> > +               long r = atomic_long_xchg(&rq->removed_util_avg, 0);
>> > +               rq->avg.util_avg = max_t(long, rq->avg.util_avg - r, 0);
>> > +               rq->avg.util_sum = max_t(s32, rq->avg.util_sum - r * LOAD_AVG_MAX, 0);
>>
>> I see one potential issue here because the rq->util_avg may (surely)
>> have been already updated and decayed during the update of a
>> sched_entity but before we substract the removed_util_avg
>
> This is the same now, because cfs_rq will be regularly updated in
> update_blocked_averages(), which basically means cfs_rq will be newer
> than task for sure, although task tries to catch up when removed.

I don't agree on that part. At now, we check and substract
removed_util_avg before calling __update_load_avg for a cfs_rq, so it
will be removed before changing last_update_time.
With your patch,  we update rq->avg.util_avg and last_update_time
without checking removed_util_avg.

> Well, yes, in this patch with rq, the situation is more so. But,
> hopefully, this is a formidable concern.