[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2c389633-9b9c-3db9-6c07-a124596e3799@arm.com>
Date: Wed, 27 Apr 2022 11:22:27 +0100
From: Vincent Donnefort <vincent.donnefort@....com>
To: Tao Zhou <tao.zhou@...ux.dev>
Cc: peterz@...radead.org, mingo@...hat.com, vincent.guittot@...aro.org,
linux-kernel@...r.kernel.org, dietmar.eggemann@....com,
morten.rasmussen@....com, chris.redpath@....com, qperret@...gle.com
Subject: Re: [PATCH v6 2/7] sched/fair: Decay task PELT values during wakeup
migration
On 27/04/2022 10:25, Tao Zhou wrote:
> On Tue, Apr 26, 2022 at 10:35:01AM +0100, Vincent Donnefort wrote:
>
>> Before being migrated to a new CPU, a task sees its PELT values
>> synchronized with rq last_update_time. Once done, that same task will also
>> have its sched_avg last_update_time reset. This means the time between
>> the migration and the last clock update (B) will not be accounted for in
>> util_avg and a discontinuity will appear. This issue is amplified by the
>> PELT clock scaling. If the clock hasn't been updated while the CPU is
>> idle, clock_pelt will not be aligned with clock_task and that time (A)
>> will be also lost.
>>
>> ---------|----- A -----|-----------|------- B -----|>
>> clock_pelt clock_task clock now
>>
>> This is especially problematic for asymmetric CPU capacity systems which
>> need stable util_avg signals for task placement and energy estimation.
>>
>> Ideally, this problem would be solved by updating the runqueue clocks
>> before the migration. But that would require taking the runqueue lock
>> which is quite expensive [1]. Instead estimate the missing time and update
>> the task util_avg with that value:
>>
>> A + B = clock_task - clock_pelt + sched_clock_cpu() - clock
>>
>> sched_clock_cpu() is a costly functinon. Limit the usage to the case where
>> the source CPU is idle as we know this is when the clock is having the
>> biggest risk of being outdated.
>>
>> Neither clock_task, clock_pelt nor clock can be accessed without the
>> runqueue lock. We then need to store those values in a timestamp variable
>> which can be accessed during the migration. rq's enter_idle will give the
>> wall-clock time when the rq went idle. We have then:
>>
>> B = sched_clock_cpu() - rq->enter_idle.
>>
>> Then, to catch-up the PELT clock scaling (A), two cases:
>>
>> * !CFS_BANDWIDTH: We can simply use clock_task(). This value is stored
>> in rq's clock_pelt_idle, before the rq enters idle. The estimated time
>> is then:
>>
>> rq->clock_pelt_idle + sched_clock_cpu() - rq->enter_idle.
>>
>> * CFS_BANDWIDTH: We can't catch-up with clock_task because of the
>> throttled_clock_task_time offset. cfs_rq's clock_pelt_idle is then
>> giving the PELT clock when the cfs_rq becomes idle. This gives:
>>
>> A = rq->clock_pelt_idle - cfs_rq->clock_pelt_idle
>
> The code calulating A below is not consistent with this. The order is reversed.
>
Good catch, but this comment is actually correct, the code is not.
rq->clock_pelt_idle is updated _after_ cfs_rq->clock_pelt_idle. (see
previous email to Vincent)
Thanks,
[...]
Powered by blists - more mailing lists