linux-kernel - Re: [RFC v2 1/8] sched/tune: add detailed documentation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20161108105302.GB2971@e105326-lin>
Date:   Tue, 8 Nov 2016 10:53:02 +0000
From:   Patrick Bellasi <patrick.bellasi@....com>
To:     Viresh Kumar <viresh.kumar@...aro.org>
Cc:     linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Steve Muckle <steve.muckle@...aro.org>,
        Leo Yan <leo.yan@...aro.org>,
        "Rafael J . Wysocki" <rjw@...ysocki.net>,
        Todd Kjos <tkjos@...gle.com>,
        Srinath Sridharan <srinathsr@...gle.com>,
        Andres Oportus <andresoportus@...gle.com>,
        Juri Lelli <juri.lelli@....com>,
        Morten Rasmussen <morten.rasmussen@....com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Chris Redpath <chris.redpath@....com>,
        Robin Randhawa <robin.randhawa@....com>,
        Jonathan Corbet <corbet@....net>, linux-doc@...r.kernel.org
Subject: Re: [RFC v2 1/8] sched/tune: add detailed documentation

On 04-Nov 15:16, Viresh Kumar wrote:
> On 27-10-16, 18:41, Patrick Bellasi wrote:
> > +This last requirement is especially important if we consider that schedutil can
> > +potentially replace all currently available CPUFreq policies. Since schedutil
> > +is event based, as opposed to the sampling driven governors, it is already more
> > +responsive at selecting the optimal OPP to run tasks allocated to a CPU.
> 
> I am not sure if I follow this paragraph. All the governors follow the same
> basic rules now. They are all event driven (events from scheduler), but they
> function only after a certain sampling period is finished. Isn't this the case ?

Right, the main difference from what I call "sample based" governors
(e.g. ondemand, interactive) is that they consider metrics which are
averaged across time (e.g. how long is idle in average a CPU).
To the contrary, with schedutil we have a direct input from the
scheduler about what is the required CPU bandwidth demand.

Thus, schedutil is not only event based but it can exploits a more
direct knowledge of what is the CPU bandwidth demand. Moreover,
depending on the CPUFreq driver latencies of a specific platform,
schedutil can be much more aggressive on triggering frequencies
transitions, e.g. on some ARM platforms we can easily have 1ms OPP
switches.
AFAIK, such fast transitions cannot be exploited by "sample based"
governors because they cannot collect sensible averages in such a
limited timeframe without the risk to be "unstable" (e.g. almost
always get a wrong decision).

> > +SchedTune exposes a simple user-space interface with a single power-performance
> > +tunable:
> > +
> > +  /proc/sys/kernel/sched_cfs_boost
> > +
> > +This permits expressing a boost value as an integer in the range [0..100].
> > +
> > +A value of 0 (default) for a CFS task means that schedutil will attempt
> > +to match compute capacity of the CPU where the task is scheduled to
> > +match its current utilization with a few spare cycles left. A value of
> > +100 means that schedutil will select the highest available OPP.
> > +
> > +The range between 0 and 100 can be set to satisfy other scenarios suitably.
> > +For example to satisfy interactive response or depending on other system events
> > +(battery level, thermal status, etc).
> 
> Earlier section said that schedutil+schedtune can replace all earlier governors.
> How will schedutil behave like powersave governor with schedtune? I was
> expecting the possible values of sched_cfs_boost to be in the range -100 to 100,
> where -100 will make it powersave, +100 will make it performance and 0 will not
> make any changes.

You right, however the negative values for the boost are introduced by
the last patch of this series. That patch updates also the
documentation to describe the meaning of negative boost values.

> --
> viresh

-- 
#include <best/regards.h>

Patrick Bellasi