linux-kernel - Re: [PATCH v6 2/7] sched/fair: Decay task PELT values during wakeup migration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2c389633-9b9c-3db9-6c07-a124596e3799@arm.com>
Date:   Wed, 27 Apr 2022 11:22:27 +0100
From:   Vincent Donnefort <vincent.donnefort@....com>
To:     Tao Zhou <tao.zhou@...ux.dev>
Cc:     peterz@...radead.org, mingo@...hat.com, vincent.guittot@...aro.org,
        linux-kernel@...r.kernel.org, dietmar.eggemann@....com,
        morten.rasmussen@....com, chris.redpath@....com, qperret@...gle.com
Subject: Re: [PATCH v6 2/7] sched/fair: Decay task PELT values during wakeup
 migration



On 27/04/2022 10:25, Tao Zhou wrote:
> On Tue, Apr 26, 2022 at 10:35:01AM +0100, Vincent Donnefort wrote:
> 
>> Before being migrated to a new CPU, a task sees its PELT values
>> synchronized with rq last_update_time. Once done, that same task will also
>> have its sched_avg last_update_time reset. This means the time between
>> the migration and the last clock update (B) will not be accounted for in
>> util_avg and a discontinuity will appear. This issue is amplified by the
>> PELT clock scaling. If the clock hasn't been updated while the CPU is
>> idle, clock_pelt will not be aligned with clock_task and that time (A)
>> will be also lost.
>>
>>     ---------|----- A -----|-----------|------- B -----|>
>>          clock_pelt   clock_task     clock            now
>>
>> This is especially problematic for asymmetric CPU capacity systems which
>> need stable util_avg signals for task placement and energy estimation.
>>
>> Ideally, this problem would be solved by updating the runqueue clocks
>> before the migration. But that would require taking the runqueue lock
>> which is quite expensive [1]. Instead estimate the missing time and update
>> the task util_avg with that value:
>>
>>    A + B = clock_task - clock_pelt + sched_clock_cpu() - clock
>>
>> sched_clock_cpu() is a costly functinon. Limit the usage to the case where
>> the source CPU is idle as we know this is when the clock is having the
>> biggest risk of being outdated.
>>
>> Neither clock_task, clock_pelt nor clock can be accessed without the
>> runqueue lock. We then need to store those values in a timestamp variable
>> which can be accessed during the migration. rq's enter_idle will give the
>> wall-clock time when the rq went idle. We have then:
>>
>>    B = sched_clock_cpu() - rq->enter_idle.
>>
>> Then, to catch-up the PELT clock scaling (A), two cases:
>>
>>    * !CFS_BANDWIDTH: We can simply use clock_task(). This value is stored
>>      in rq's clock_pelt_idle, before the rq enters idle. The estimated time
>>      is then:
>>
>>        rq->clock_pelt_idle + sched_clock_cpu() - rq->enter_idle.
>>
>>    * CFS_BANDWIDTH: We can't catch-up with clock_task because of the
>>      throttled_clock_task_time offset. cfs_rq's clock_pelt_idle is then
>>      giving the PELT clock when the cfs_rq becomes idle. This gives:
>>
>>        A = rq->clock_pelt_idle - cfs_rq->clock_pelt_idle
> 
> The code calulating A below is not consistent with this. The order is reversed.
> 

Good catch, but this comment is actually correct, the code is not. 
rq->clock_pelt_idle is updated _after_ cfs_rq->clock_pelt_idle. (see
previous email to Vincent)

Thanks,

[...]