linux-kernel - Re: [PATCH v2] sched: rt: Make RT capacity aware

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtA=CzkTVwdCJL6ULYB628tWdGAvpD-sHfgSfL59PyYvxA@mail.gmail.com>
Date:   Tue, 29 Oct 2019 12:17:25 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Qais Yousef <qais.yousef@....com>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] sched: rt: Make RT capacity aware

On Tue, 29 Oct 2019 at 12:02, Qais Yousef <qais.yousef@....com> wrote:
>
> On 10/29/19 09:13, Vincent Guittot wrote:
> > On Wed, 9 Oct 2019 at 12:46, Qais Yousef <qais.yousef@....com> wrote:
> > >
> > > Capacity Awareness refers to the fact that on heterogeneous systems
> > > (like Arm big.LITTLE), the capacity of the CPUs is not uniform, hence
> > > when placing tasks we need to be aware of this difference of CPU
> > > capacities.
> > >
> > > In such scenarios we want to ensure that the selected CPU has enough
> > > capacity to meet the requirement of the running task. Enough capacity
> > > means here that capacity_orig_of(cpu) >= task.requirement.
> > >
> > > The definition of task.requirement is dependent on the scheduling class.
> > >
> > > For CFS, utilization is used to select a CPU that has >= capacity value
> > > than the cfs_task.util.
> > >
> > >         capacity_orig_of(cpu) >= cfs_task.util
> > >
> > > DL isn't capacity aware at the moment but can make use of the bandwidth
> > > reservation to implement that in a similar manner CFS uses utilization.
> > > The following patchset implements that:
> > >
> > > https://lore.kernel.org/lkml/20190506044836.2914-1-luca.abeni@santannapisa.it/
> > >
> > >         capacity_orig_of(cpu)/SCHED_CAPACITY >= dl_deadline/dl_runtime
> > >
> > > For RT we don't have a per task utilization signal and we lack any
> > > information in general about what performance requirement the RT task
> > > needs. But with the introduction of uclamp, RT tasks can now control
> > > that by setting uclamp_min to guarantee a minimum performance point.
> > >
> > > ATM the uclamp value are only used for frequency selection; but on
> > > heterogeneous systems this is not enough and we need to ensure that the
> > > capacity of the CPU is >= uclamp_min. Which is what implemented here.
> > >
> > >         capacity_orig_of(cpu) >= rt_task.uclamp_min
> > >
> > > Note that by default uclamp.min is 1024, which means that RT tasks will
> > > always be biased towards the big CPUs, which make for a better more
> > > predictable behavior for the default case.
> >
> > hmm... big cores are not always the best choices for rt tasks, they
> > generally took more time to wake up or to switch context because of
> > the pipeline depth and others branch predictions
>
> Can you quantify this into a number? I suspect this latency should be in the

As a general rule, we pinned IRQs on little core because of such
responsiveness  difference. I don't have numbers in mind as the tests
were run at the beg of b.L system.. few years ago
Then, if you look at some idle states definitions in DT, you will see
that exit latency of cluster down state of big core of hikey960 is
2900us vs 1600us for little

> 200-300us range. And the difference between little and big should be much
> smaller than that, no? We can't give guarantees in Linux in that order in
> general and for serious real time users they have to do extra tweaks like
> disabling power management which can introduce latency and hinder determinism.
> Beside enabling PREEMPT_RT.
>
> For generic systems a few ms is the best we can give and we can easily fall out
> of this without any tweaks.
>
> The choice of going to the maximum performance point in the system for RT tasks
> by default goes beyond this patch anyway. I'm just making it consistent here
> since we have different performance levels and RT didn't understand this
> before.
>
> So what I'm doing here is just make things consistent rather than change the
> default.
>
> What do you suggest?

Making big cores the default CPUs for all RT tasks is not a minor
change and IMO locality should stay the default behavior when there is
no uclamp constraint

>
> Thanks
>
> --
> Qais Yousef