linux-kernel - Re: [PATCH v5 02/10] sched/rt: add rt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtCvo=qNo0Be97Ph3dGCu88_V2sm37nctVNgL+nNapH91w@mail.gmail.com>
Date:   Wed, 30 May 2018 16:39:48 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Patrick Bellasi <patrick.bellasi@....com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Juri Lelli <juri.lelli@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Morten Rasmussen <Morten.Rasmussen@....com>,
        viresh kumar <viresh.kumar@...aro.org>,
        Valentin Schneider <valentin.schneider@....com>,
        Quentin Perret <quentin.perret@....com>
Subject: Re: [PATCH v5 02/10] sched/rt: add rt_rq utilization tracking

On 30 May 2018 at 13:01, Patrick Bellasi <patrick.bellasi@....com> wrote:
> On 30-May 12:06, Vincent Guittot wrote:
>> On 30 May 2018 at 11:32, Patrick Bellasi <patrick.bellasi@....com> wrote:
>> > On 29-May 15:29, Vincent Guittot wrote:
>
> [...]
>
>> >> >> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
>> >> >> index ef3c4e6..b4148a9 100644
>> >> >> --- a/kernel/sched/rt.c
>> >> >> +++ b/kernel/sched/rt.c
>> >> >> @@ -5,6 +5,8 @@
>> >> >>   */
>> >> >>  #include "sched.h"
>> >> >>
>> >> >> +#include "pelt.h"
>> >> >> +
>> >> >>  int sched_rr_timeslice = RR_TIMESLICE;
>> >> >>  int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE;
>> >> >>
>> >> >> @@ -1572,6 +1574,9 @@ pick_next_task_rt(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
>> >> >>
>> >> >>       rt_queue_push_tasks(rq);
>> >> >>
>> >> >> +     update_rt_rq_load_avg(rq_clock_task(rq), rq,
>> >> >> +             rq->curr->sched_class == &rt_sched_class);
>> >> >> +
>> >> >>       return p;
>> >> >>  }
>> >> >>
>> >> >> @@ -1579,6 +1584,8 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
>> >> >>  {
>> >> >>       update_curr_rt(rq);
>> >> >>
>> >> >> +     update_rt_rq_load_avg(rq_clock_task(rq), rq, 1);
>> >> >> +
>> >> >>       /*
>> >> >>        * The previous task needs to be made eligible for pushing
>> >> >>        * if it is still active
>> >> >> @@ -2308,6 +2315,7 @@ static void task_tick_rt(struct rq *rq, struct task_struct *p, int queued)
>> >> >>       struct sched_rt_entity *rt_se = &p->rt;
>> >> >>
>> >> >>       update_curr_rt(rq);
>> >> >> +     update_rt_rq_load_avg(rq_clock_task(rq), rq, 1);
>> >> >
>> >> > Mmm... not entirely sure... can't we fold
>> >> >    update_rt_rq_load_avg() into update_curr_rt() ?
>> >> >
>> >> > Currently update_curr_rt() is used in:
>> >> >    dequeue_task_rt
>> >> >    pick_next_task_rt
>> >> >    put_prev_task_rt
>> >> >    task_tick_rt
>> >> >
>> >> > while we update_rt_rq_load_avg() only in:
>> >> >    pick_next_task_rt
>> >> >    put_prev_task_rt
>> >> >    task_tick_rt
>> >> > and
>> >> >    update_blocked_averages
>> >> >
>> >> > Why we don't we need to update at dequeue_task_rt() time ?
>> >>
>> >> We are tracking rt rq and not sched entities so we want to know when
>> >> sched rt will be the running or not  sched class on the rq. Tracking
>> >> dequeue_task_rt is useless
>> >
>> > What about (push) migrations?
>>
>> it doesn't make any difference. put_prev_task_rt() says that the prev
>> task that was running, was a rt task so we can account past time at rt
>> running time
>> and pick_next_task_rt says that the next one will be a rt task so we
>> have to account elapse time either to rt or not rt time according.
>
> Right, I was missing that you are tracking RT (and DL) only at RQ
> level... not SE level, thus we will not see migrations of blocked
> utilization.
>
>> I can probably optimize the pick_next_task_rt by doing the below instead:
>>
>> if (rq->curr->sched_class == &rt_sched_class)
>>        update_rt_rq_load_avg(rq_clock_task(rq), rq, 0);
>>
>> If prev task is a rt  task, put_prev_task_rt has already done the update
>
> Right.
>
> Just one more question about non tracking SE. Once we migrate an RT
> task with the current solution we will have to wait for it's PELT
> blocked utilization to decay completely before starting to ignore that
> task contribution, which means that:
>  1. we will see an higher utilization on the original CPU
>  2. we don't immediately see the increased utilization on the
>     destination CPU
>
> I remember Juri had some patches to track SE utilization thus fixing
> the two issues above. Can you remember me why we decided to go just
> for the RQ tracking solution?

I would say that one main reason is the overhead of tracking per SE

Then, what we want to track the other class utilization to replace
current rt_avg.

And we want something to track steal time of cfs to compensate the
fact that cfs util_avg will be lower than what cfs really needs.
so we really want rt util_avg to smoothly decrease if a rt task
migrate to let time to cfs util_avg to smoothly increase itself as cfs
tasks will run more often.

Based on some discussion on IRC, I'm studying how to track more
accurately the stolen time

> Don't we expect any strange behaviors on real systems when RT tasks
> are moved around?

Which kind of strange behavior ? we don't use rt util_avg for OPP
selection when a rt task is running

>
> Perhaps we should run some tests on Android...
>
> --
> #include <best/regards.h>
>
> Patrick Bellasi