[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtCsd2RkOZoa10SSwKhm0NRzmOphAVNW7_JykqzxqfkSXg@mail.gmail.com>
Date: Tue, 19 Jul 2022 12:29:31 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Chengming Zhou <zhouchengming@...edance.com>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>, mingo@...hat.com,
peterz@...radead.org, rostedt@...dmis.org, bsegall@...gle.com,
vschneid@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [External] Re: [PATCH v2 07/10] sched/fair: use update_load_avg()
to attach/detach entity load_avg
On Fri, 15 Jul 2022 at 18:21, Chengming Zhou
<zhouchengming@...edance.com> wrote:
>
> On 2022/7/15 19:18, Dietmar Eggemann wrote:
> > On 13/07/2022 06:04, Chengming Zhou wrote:
> >> Since update_load_avg() support DO_ATTACH and DO_DETACH now, we can
> >> use update_load_avg() to implement attach/detach entity load_avg.
> >>
> >> Another advantage of using update_load_avg() is that it will check
> >> last_update_time before attach or detach, instead of unconditional
> >> attach/detach in the current code.
> >>
> >> This way can avoid some corner problematic cases of load tracking,
> >> like twice attach problem, detach unattached NEW task problem.
> >
> > This explanation is somewhat hard to follow for me. Since both issues
> > have been fixed already (you mention this further below) you're saying
> > that with you change you don't reintroduce them?
>
> Sorry for this not very clear explanation.
>
> Yes, both issues have been fixed already, what I want to say is that bugfix
> brings its own problem and limitation mentioned below.
As Dietmar said, the commit message is misleading because someone can
think you fix these bugs whereas it's not the case
>
> So I want to use another way to solve these problems better.
>
> >
> >> 1. switch to fair class (twice attach problem)
> >>
> >> p->sched_class = fair_class; --> p.se->avg.last_update_time = 0
> >> if (queued)
> >> enqueue_task(p);
> >> ...
> >> enqueue_entity()
> >> update_load_avg(UPDATE_TG | DO_ATTACH)
> >> if (!se->avg.last_update_time && (flags & DO_ATTACH)) --> true
> >> attach_entity_load_avg() --> attached, will set last_update_time
> >> check_class_changed()
> >> switched_from() (!fair)
> >> switched_to() (fair)
> >> switched_to_fair()
> >> attach_entity_load_avg() --> unconditional attach again!
> >>
> >> 2. change cgroup of NEW task (detach unattached task problem)
> >>
> >> sched_move_group(p)
> >> if (queued)
> >> dequeue_task()
> >> task_move_group_fair()
> >> detach_task_cfs_rq()
> >> detach_entity_load_avg() --> detach unattached NEW task
> >> set_task_rq()
> >> attach_task_cfs_rq()
> >> attach_entity_load_avg()
> >> if (queued)
> >> enqueue_task()
> >>
> >> These problems have been fixed in commit 7dc603c9028e
> >> ("sched/fair: Fix PELT integrity for new tasks"), which also
> >> bring its own problems.
> >>
> >> First, it add a new task state TASK_NEW and an unnessary limitation
> >> that we would fail when change the cgroup of TASK_NEW tasks.
>
> This is the limitation that bugfix has brought.
>
> We can't change cgroup or switch to fair for task with last_update_time=0
> if we don't have conditional detach/attach.
>
> So we have to:
>
> 1. !fair task also need to set last_update_time.
> 2. cpu_cgroup_can_attach() have to wait for TASK_NEW to fully attached.
>
> >>
> >> Second, it attach entity load_avg in post_init_entity_util_avg(),
> >> in which we only set sched_avg last_update_time for !fair tasks,
> >> will cause PELT integrity problem when switched_to_fair().
> >
> > I guess those PELT integrity problems are less severe since we have the
> > enqueue_task_fair() before the switched_to_fair() for enqueued tasks. So
> > we always decay the time the task spend outside fair.
>
> enqueue_task_fair()
> enqueue_entity()
> update_load_avg()
> if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD)) --> true
> __update_load_avg_se(now, cfs_rq, se); --> (1)
>
> We can see above, for queued !fair task, (1) will deay the delta time
> (now - se.avg.last_update_time) even for a NEW !fair task.
>
> >
> > Looks to me that you want to replace this by your `freeze PELT when
> > outside fair` model.
>
> Yes, want to freeze PELT for two !fair cases:
>
> 1. !fair task hasn't been fair before: will still have its init load_avg
> when switch to fair.
But I'm not sure it makes sense to keep these init values. As an
example, the util_avg is set according to the cpu utilization at the
time of the task creation. I would tend to decay them as these init
values become less and less relevant.
so we should return early in post_init_entity_util_avg() and don't set
util_avg if sched class is not cfs
>
> 2. !fair task has been switched_from_fair(): will still keep its lastest
> load_avg when switch to fair.
>
> >
> >> This patch make update_load_avg() the only location of attach/detach,
> >> and can handle these corner cases like change cgroup of NEW tasks,
> >> by checking last_update_time before attach/detach.
> >
> > [...]
> >
> >> @@ -11527,9 +11522,7 @@ static void detach_entity_cfs_rq(struct sched_entity *se)
> >> struct cfs_rq *cfs_rq = cfs_rq_of(se);
> >>
> >> /* Catch up with the cfs_rq and remove our load when we leave */
> >> - update_load_avg(cfs_rq, se, 0);
> >> - detach_entity_load_avg(cfs_rq, se);
> >> - update_tg_load_avg(cfs_rq);
> >> + update_load_avg(cfs_rq, se, UPDATE_TG | DO_DETACH);
> >
> > IMHO, the DO_[DE|AT]TACH comments in update_load_avg() would have to be
> > updated in this case.
>
> Correct, will do.
>
> Thanks.
>
> >
> > [...]
Powered by blists - more mailing lists