linux-kernel - Re: [External] Re: [PATCH v2 07/10] sched/fair: use update_load_avg() to attach/detach entity load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <107763fc-7eab-a807-fa15-024178bdf57e@bytedance.com>
Date:   Wed, 20 Jul 2022 21:40:55 +0800
From:   Chengming Zhou <zhouchengming@...edance.com>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Dietmar Eggemann <dietmar.eggemann@....com>, mingo@...hat.com,
        peterz@...radead.org, rostedt@...dmis.org, bsegall@...gle.com,
        vschneid@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [External] Re: [PATCH v2 07/10] sched/fair: use update_load_avg()
 to attach/detach entity load_avg

On 2022/7/19 18:29, Vincent Guittot wrote:
> On Fri, 15 Jul 2022 at 18:21, Chengming Zhou
> <zhouchengming@...edance.com> wrote:
>>
>> On 2022/7/15 19:18, Dietmar Eggemann wrote:
>>> On 13/07/2022 06:04, Chengming Zhou wrote:
>>>> Since update_load_avg() support DO_ATTACH and DO_DETACH now, we can
>>>> use update_load_avg() to implement attach/detach entity load_avg.
>>>>
>>>> Another advantage of using update_load_avg() is that it will check
>>>> last_update_time before attach or detach, instead of unconditional
>>>> attach/detach in the current code.
>>>>
>>>> This way can avoid some corner problematic cases of load tracking,
>>>> like twice attach problem, detach unattached NEW task problem.
>>>
>>> This explanation is somewhat hard to follow for me. Since both issues
>>> have been fixed already (you mention this further below) you're saying
>>> that with you change you don't reintroduce them?
>>
>> Sorry for this not very clear explanation.
>>
>> Yes, both issues have been fixed already, what I want to say is that bugfix
>> brings its own problem and limitation mentioned below.
> 
> As Dietmar said, the commit message is misleading because someone can
> think you fix these bugs whereas it's not the case

Hi Vincent, thanks for your review! I will refactor the commit message to avoid
this misleading, sorry for my bad English expression.

> 
>>
>> So I want to use another way to solve these problems better.
>>
>>>
>>>> 1. switch to fair class (twice attach problem)
>>>>
>>>> p->sched_class = fair_class;  --> p.se->avg.last_update_time = 0
>>>> if (queued)
>>>>   enqueue_task(p);
>>>>     ...
>>>>       enqueue_entity()
>>>>         update_load_avg(UPDATE_TG | DO_ATTACH)
>>>>           if (!se->avg.last_update_time && (flags & DO_ATTACH))  --> true
>>>>             attach_entity_load_avg()  --> attached, will set last_update_time
>>>> check_class_changed()
>>>>   switched_from() (!fair)
>>>>   switched_to()   (fair)
>>>>     switched_to_fair()
>>>>       attach_entity_load_avg()  --> unconditional attach again!
>>>>
>>>> 2. change cgroup of NEW task (detach unattached task problem)
>>>>
>>>> sched_move_group(p)
>>>>   if (queued)
>>>>     dequeue_task()
>>>>   task_move_group_fair()
>>>>     detach_task_cfs_rq()
>>>>       detach_entity_load_avg()  --> detach unattached NEW task
>>>>     set_task_rq()
>>>>     attach_task_cfs_rq()
>>>>       attach_entity_load_avg()
>>>>   if (queued)
>>>>     enqueue_task()
>>>>
>>>> These problems have been fixed in commit 7dc603c9028e
>>>> ("sched/fair: Fix PELT integrity for new tasks"), which also
>>>> bring its own problems.
>>>>
>>>> First, it add a new task state TASK_NEW and an unnessary limitation
>>>> that we would fail when change the cgroup of TASK_NEW tasks.
>>
>> This is the limitation that bugfix has brought.
>>
>> We can't change cgroup or switch to fair for task with last_update_time=0
>> if we don't have conditional detach/attach.
>>
>> So we have to:
>>
>> 1. !fair task also need to set last_update_time.
>> 2. cpu_cgroup_can_attach() have to wait for TASK_NEW to fully attached.
>>
>>>>
>>>> Second, it attach entity load_avg in post_init_entity_util_avg(),
>>>> in which we only set sched_avg last_update_time for !fair tasks,
>>>> will cause PELT integrity problem when switched_to_fair().
>>>
>>> I guess those PELT integrity problems are less severe since we have the
>>> enqueue_task_fair() before the switched_to_fair() for enqueued tasks. So
>>> we always decay the time the task spend outside fair.
>>
>> enqueue_task_fair()
>>   enqueue_entity()
>>     update_load_avg()
>>       if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD))  --> true
>>         __update_load_avg_se(now, cfs_rq, se);  --> (1)
>>
>> We can see above, for queued !fair task, (1) will deay the delta time
>> (now - se.avg.last_update_time) even for a NEW !fair task.
>>
>>>
>>> Looks to me that you want to replace this by your `freeze PELT when
>>> outside fair` model.
>>
>> Yes, want to freeze PELT for two !fair cases:
>>
>> 1. !fair task hasn't been fair before: will still have its init load_avg
>>    when switch to fair.
> 
> But I'm not sure it makes sense to keep these init values. As an
> example, the util_avg is set according to the cpu utilization at the
> time of the task creation. I would tend to decay them as these init
> values become less and less relevant.
> 
> so we should return early in post_init_entity_util_avg() and don't set
> util_avg if sched class is not cfs

Yes, this indeed is a problem if we attach this init sched_avg of !fair task.
I'm also not sure whether it make sense to keep them to 0 ? Will it cause
unfairness problem between cfs_rqs?

> 
>>
>> 2. !fair task has been switched_from_fair(): will still keep its lastest
>>    load_avg when switch to fair.
>>
>>>
>>>> This patch make update_load_avg() the only location of attach/detach,
>>>> and can handle these corner cases like change cgroup of NEW tasks,
>>>> by checking last_update_time before attach/detach.
>>>
>>> [...]
>>>
>>>> @@ -11527,9 +11522,7 @@ static void detach_entity_cfs_rq(struct sched_entity *se)
>>>>      struct cfs_rq *cfs_rq = cfs_rq_of(se);
>>>>
>>>>      /* Catch up with the cfs_rq and remove our load when we leave */
>>>> -    update_load_avg(cfs_rq, se, 0);
>>>> -    detach_entity_load_avg(cfs_rq, se);
>>>> -    update_tg_load_avg(cfs_rq);
>>>> +    update_load_avg(cfs_rq, se, UPDATE_TG | DO_DETACH);
>>>
>>> IMHO, the DO_[DE|AT]TACH comments in update_load_avg() would have to be
>>> updated in this case.
>>
>> Correct, will do.
>>
>> Thanks.
>>
>>>
>>> [...]