linux-kernel - Re: [PATCH v2] sched: rt: Make RT capacity aware

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20191029203619.GA7607@darkstar>
Date:   Tue, 29 Oct 2019 20:36:19 +0000
From:   Patrick Bellasi <patrick.bellasi@...bug.net>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Qais Yousef <qais.yousef@....com>, Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] sched: rt: Make RT capacity aware

Hi Vincent, Qais,

On 29-Oct 13:54, Vincent Guittot wrote:
> On Tue, 29 Oct 2019 at 13:46, Qais Yousef <qais.yousef@....com> wrote:
> > On 10/29/19 13:20, Vincent Guittot wrote:
> > > > > Making big cores the default CPUs for all RT tasks is not a minor
> > > > > change and IMO locality should stay the default behavior when there is
> > > > > no uclamp constraint

The default value for system-wide's uclamp.min is 1024 because by
default _we want_ RT tasks running at MAX_OPP. This means that by
default RT tasks are running _under constraints_.

The "no uclamp constraints" case you mention requires that you set
uclamp.min=0 for that task. In that case, this patch will do exactly
what you ask for: locality is preserved.

> > > > How this is affecting locality? The task will always go to the big core, so it
> > > > should be local.
> > >
> > > local with the waker
> > > You will force rt task to run on big cluster although waker, data and
> > > interrupts can be on little one.
> > > So making big core as default is far from always being the best choice
> >
> > This is loaded with assumptions IMO. AFAICT we don't know what's the best
> > choice.

Agree... more on that after...

> > First, the value of uclamp.min is outside of the scope of this patch. Unless
> > what you're saying is that when uclamp.min is 1024 then we should NOT choose a
> > big cpu then there's no disagreement about what this patch do. If that's what
> > you're objecting to please be more specific about how do you see this working
> > instead.
> 
> My point is that this patch makes the big cores the default CPUs for
> RT tasks which is far from being a minor change and far from being
> an obvious default good choice

Some time ago we agreed that going to MAX_OPP for RT tasks was
"mandatory". That was defenitively a big change, likely much more
impacting than the one proposed by this patch.

On many mobile devices we ended up pinning RT tasks on LITTLE cores
(mainly) to save quite a lot of energy by avoiding the case of big
CPUs randomly spiking to MAX_OPP just because of a small RT task
waking up on them. We also added some heuristic in schedutil has a
"band aid" for the effects of the aforementioned choice.

By running RT on LITTLEs there could be also some wakeup latency
improvement? Yes, maybe... would be interesting to have some real
HW *and* SW use-case on hand to compare.

However, we know that RT is all about "latency", but what is a bit
more fuzzy is the definition of "latency":

 A) wakeup-latency
    From a scheduler standpoint it's quite often considered as the the
    time it takes to "wakeup" a task and actually start executing its
    instructions.

 B) completion-time
    From an app standpoint, it's quite often important the time to
    complete the task activation and go back to sleep.

Running at MAX_OPP looks much more related to the need to complete
fast than waking up fast, especially considering that that decision
was taken looking mainly (perhaps only) to SMP systems.

On heterogeneous systems, "wakeup-latency" and "completion-time" are
two metrics which *maybe* can be better served by different cores.
However, it's very difficult to argument if one metric is more
important than the other. It's even more difficult to quantify it
because of the multitide of HW and SW combinations.

Thus, what's proposed here can be far from being an "obvious good
chooce", but it's also quite difficult to argue and proof that's
defenitively _not_ a good choice. It's just a default which:

 1) keeps things simple for RT tasks by using the same default
    policy for both frequency and CPUs selection
    we always run (when possible) at the highest available capacity

 2) it's based on a per-task/system-wide tunable mechanism

Is that not enought to justfy it as a default?

Best,
Patrick

-- 
#include <best/regards.h>

Patrick Bellasi