linux-kernel - Re: [RFCv5 PATCH 41/46] sched/fair: add triggers for OPP change requests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55C8AA92.8000201@arm.com>
Date:	Mon, 10 Aug 2015 14:43:46 +0100
From:	Juri Lelli <juri.lelli@....com>
To:	Vincent Guittot <vincent.guittot@...aro.org>,
	Morten Rasmussen <Morten.Rasmussen@....com>
CC:	Peter Zijlstra <peterz@...radead.org>,
	"mingo@...hat.com" <mingo@...hat.com>,
	Daniel Lezcano <daniel.lezcano@...aro.org>,
	Dietmar Eggemann <Dietmar.Eggemann@....com>,
	Yuyang Du <yuyang.du@...el.com>,
	Michael Turquette <mturquette@...libre.com>,
	"rjw@...ysocki.net" <rjw@...ysocki.net>,
	Sai Charan Gurrappadi <sgurrappadi@...dia.com>,
	"pang.xunlei@....com.cn" <pang.xunlei@....com.cn>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	"linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>
Subject: Re: [RFCv5 PATCH 41/46] sched/fair: add triggers for OPP change requests

Hi Vincent,

On 04/08/15 14:41, Vincent Guittot wrote:
> Hi Juri,
> 
> On 7 July 2015 at 20:24, Morten Rasmussen <morten.rasmussen@....com> wrote:
>> From: Juri Lelli <juri.lelli@....com>
>>
>> Each time a task is {en,de}queued we might need to adapt the current
>> frequency to the new usage. Add triggers on {en,de}queue_task_fair() for
>> this purpose.  Only trigger a freq request if we are effectively waking up
>> or going to sleep.  Filter out load balancing related calls to reduce the
>> number of triggers.
>>
>> cc: Ingo Molnar <mingo@...hat.com>
>> cc: Peter Zijlstra <peterz@...radead.org>
>>
>> Signed-off-by: Juri Lelli <juri.lelli@....com>
>> ---
>>  kernel/sched/fair.c | 42 ++++++++++++++++++++++++++++++++++++++++--
>>  1 file changed, 40 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index f74e9d2..b8627c6 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -4281,7 +4281,10 @@ static inline void hrtick_update(struct rq *rq)
>>  }
>>  #endif
>>
>> +static unsigned int capacity_margin = 1280; /* ~20% margin */
>> +
>>  static bool cpu_overutilized(int cpu);
>> +static unsigned long get_cpu_usage(int cpu);
>>  struct static_key __sched_energy_freq __read_mostly = STATIC_KEY_INIT_FALSE;
>>
>>  /*
>> @@ -4332,6 +4335,26 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>>                 if (!task_new && !rq->rd->overutilized &&
>>                     cpu_overutilized(rq->cpu))
>>                         rq->rd->overutilized = true;
>> +               /*
>> +                * We want to trigger a freq switch request only for tasks that
>> +                * are waking up; this is because we get here also during
>> +                * load balancing, but in these cases it seems wise to trigger
>> +                * as single request after load balancing is done.
>> +                *
>> +                * XXX: how about fork()? Do we need a special flag/something
>> +                *      to tell if we are here after a fork() (wakeup_task_new)?
>> +                *
>> +                * Also, we add a margin (same ~20% used for the tipping point)
>> +                * to our request to provide some head room if p's utilization
>> +                * further increases.
>> +                */
>> +               if (sched_energy_freq() && !task_new) {
>> +                       unsigned long req_cap = get_cpu_usage(cpu_of(rq));
>> +
>> +                       req_cap = req_cap * capacity_margin
>> +                                       >> SCHED_CAPACITY_SHIFT;
>> +                       cpufreq_sched_set_cap(cpu_of(rq), req_cap);
>> +               }
>>         }
>>         hrtick_update(rq);
>>  }
>> @@ -4393,6 +4416,23 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>>         if (!se) {
>>                 sub_nr_running(rq, 1);
>>                 update_rq_runnable_avg(rq, 1);
>> +               /*
>> +                * We want to trigger a freq switch request only for tasks that
>> +                * are going to sleep; this is because we get here also during
>> +                * load balancing, but in these cases it seems wise to trigger
>> +                * as single request after load balancing is done.
>> +                *
>> +                * Also, we add a margin (same ~20% used for the tipping point)
>> +                * to our request to provide some head room if p's utilization
>> +                * further increases.
>> +                */
>> +               if (sched_energy_freq() && task_sleep) {
>> +                       unsigned long req_cap = get_cpu_usage(cpu_of(rq));
>> +
>> +                       req_cap = req_cap * capacity_margin
>> +                                       >> SCHED_CAPACITY_SHIFT;
>> +                       cpufreq_sched_set_cap(cpu_of(rq), req_cap);
> 
> Could you clarify why you want to trig a freq switch for tasks that
> are going to sleep ?
> The cpu_usage should not changed that much as the se_utilization of
> the entity moves from utilization_load_avg to utilization_blocked_avg
> of the rq and the usage and the freq are updated periodically.

I think we still need to cover multiple back-to-back dequeues. Suppose
that you have, let's say, 3 tasks that get enqueued at the same time.
After some time the first one goes to sleep and its utilization, as you
say, gets moved to utilization_blocked_avg. So, nothing changes, and
the trigger is superfluous (even if no freq change I guess will be
issued as we are already servicing enough capacity). However, after a
while, the second task goes to sleep. Now we still use get_cpu_usage()
and the first task contribution in utilization_blocked_avg should have
been decayed by this time. Same thing may than happen for the third task
as well. So, if we don't check if we need to scale down in
dequeue_task_fair, it seems to me that we might miss some opportunities,
as blocked contribution of other tasks could have been successively
decayed.

What you think?

Thanks,

- Juri

> It should be the same for the wake up of a task in enqueue_task_fair
> above, even if it's less obvious for this latter use case because the
> cpu might wake up from a long idle phase during which its
> utilization_blocked_avg has not been updated. Nevertheless, a trig of
> the freq switch at wake up of the cpu once its usage has been updated
> should do the job.
> 
> So tick, migration of tasks, new tasks, entering/leaving idle state of
> cpu should be enough to trig freq switch
> 
> Regards,
> Vincent
> 
> 
>> +               }
>>         }
>>         hrtick_update(rq);
>>  }
>> @@ -4959,8 +4999,6 @@ static int find_new_capacity(struct energy_env *eenv,
>>         return idx;
>>  }
>>
>> -static unsigned int capacity_margin = 1280; /* ~20% margin */
>> -
>>  static bool cpu_overutilized(int cpu)
>>  {
>>         return (capacity_of(cpu) * 1024) <
>> --
>> 1.9.1
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/