linux-kernel - Re: [PATCH 2/4] sched/fair: Decay task PELT values during migration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2551684c-c987-b143-ba69-4fb0c55f61c7@arm.com>
Date:   Tue, 21 Dec 2021 13:46:01 +0100
From:   Dietmar Eggemann <dietmar.eggemann@....com>
To:     Vincent Donnefort <vincent.donnefort@....com>
Cc:     peterz@...radead.org, mingo@...hat.com, vincent.guittot@...aro.org,
        linux-kernel@...r.kernel.org, valentin.schneider@....com,
        morten.rasmussen@....com, chris.redpath@....com,
        qperret@...gle.com, lukasz.luba@....com
Subject: Re: [PATCH 2/4] sched/fair: Decay task PELT values during migration

On 20.12.21 17:09, Vincent Donnefort wrote:
> On Mon, Dec 20, 2021 at 12:26:23PM +0100, Dietmar Eggemann wrote:
>> On 09.12.21 17:11, Vincent Donnefort wrote:

[...]

>> Why do you use `avg.last_update_time` (lut) of the root cfs_rq here?
>>
>> p's lut was just synced to cfs_rq_of(se)'s lut in
>>
>> migrate_task_rq_fair() (1) -> remove_entity_load_avg() ->
>> sync_entity_load_avg(se) (2)
> 
> Huum, indeed, the estimation is an offset on top of the se's last_update_time,
> which I suppose could be different from the rq's cfs_rq.
> 
> I'll add a sched_entity argument for this function, to use either cfs_rq_of(se)
> or se last_update_time

OK, or an `u64 now or lut`.

[...]

>>>  	} else {
>>> +		remove_entity_load_avg(se);
>>> +
>>>  		/*
>>> -		 * We are supposed to update the task to "current" time, then
>>> -		 * its up to date and ready to go to new CPU/cfs_rq. But we
>>> -		 * have difficulty in getting what current time is, so simply
>>> -		 * throw away the out-of-date time. This will result in the
>>> -		 * wakee task is less decayed, but giving the wakee more load
>>> -		 * sounds not bad.
>>> +		 * Here, the task's PELT values have been updated according to
>>> +		 * the current rq's clock. But if that clock hasn't been
>>> +		 * updated in a while, a substantial idle time will be missed,
>>> +		 * leading to an inflation after wake-up on the new rq.
>>> +		 *
>>> +		 * Estimate the PELT clock lag, and update sched_avg to ensure
>>> +		 * PELT continuity after migration.
>>>  		 */
>>> -		remove_entity_load_avg(&p->se);
>>> +		__update_load_avg_blocked_se(rq_clock_pelt_estimator(rq), se);
>>
>> We do __update_load_avg_blocked_se() now twice for p, 1. in (2) and then
>> in (1) again.
> 
> the first __update_load_avg_blocked_se() ensures the se is aligned with the
> cfs_rq's clock and then, update the "removed" struct accordingly. We couldn't
> use the estimator there, it would break that structure.

You're right. I missed this bit.

Related to this: Looks like on CAS/EAS we don't rely on
remove_entity_load_avg()->sync_entity_load_avg(se) since it is already
called during  select_task_rq().