linux-kernel - Re: [SchedulerWakeupLatency] Skip energy aware task placement

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200729140121.GB1075614@google.com>
Date:   Wed, 29 Jul 2020 15:01:21 +0100
From:   Quentin Perret <qperret@...gle.com>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     Patrick Bellasi <patrick.bellasi@...bug.net>,
        LKML <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Paul Turner <pjt@...gle.com>, Ben Segall <bsegall@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Jonathan Corbet <corbet@....net>,
        Dhaval Giani <dhaval.giani@...cle.com>,
        Josef Bacik <jbacik@...com>,
        Chris Hyser <chris.hyser@...cle.com>,
        Parth Shah <parth@...ux.ibm.com>
Subject: Re: [SchedulerWakeupLatency] Skip energy aware task placement

On Thursday 23 Jul 2020 at 11:31:39 (+0200), Dietmar Eggemann wrote:
> The search for the most energy-efficient CPU over the Performance
> Domains (PDs) by consulting the Energy Model (EM), i.e. the estimation
> on how much energy a PD consumes if the task migrates to one of its
> CPUs, adds a certain amount of latency to task placement.
> 
> For some tasks this extra latency could be too high. A good example here
> are the Android display pipeline tasks, UIThread and RenderThread. They
> have to be placed on idle CPUs with a faster wakeup mechanism than the
> energy aware wakeup path (A) to guarantee the smallest amount of dropped
> or delayed frames (a.k.a. jank).
> 
> In Linux kernel mainline there is currently no mechanism for latency
> sensitive tasks to allow that the energy aware wakeup path (A) is
> skipped and the fast path (B) taken instead.

FWIW, Android has indeed been using a similar concept for many years
(the 'prefer-idle' flag you mentioned below) and that proved to be
critical to meet UI requirements on a number of devices. So +1 to have
this use case covered as part of the latency-nice work.

<...>
> > K) Task-Group tuning: which knobs are required
> 
> Currently Android uses the 'prefer idle' mechanism only on task-groups
> and not on individual tasks.

To add a little bit of context here, Android uses cgroups extensively
for task classification. The tasks of any currently used application are
placed in a cgroup based on their 'role'. For instance, all tasks that
belong to the currently visible application are classified as 'top-app',
and are grouped in a cgroup with 'prefer-idle' set (+ other attributes,
e.g. uclamp). The other tasks in the system are usually placed in other
cgroups (e.g. 'background'), with 'prefer-idle' cleared.

When the user switches from one app to the other, their roles are
swapped. At this point userspace needs to move all tasks of an app from
one cgroup to another, to reflect the change in role. But interestingly,
all tasks of the app belong to the same process, so userspace only needs
to write the PID of the parent task to the new cgroup to move all of
them at once. (Note: we're also experimenting having cgroups per-app
rather than per-role, and to change the cgroup parameters instead of
migrating between groups, but that doesn't change the conclusion below).

So, having 'only' a per-task API for latency-nice may not be sufficient
to cover the Android use-case, as that would force userspace to iterate
over all the tasks in an app when switching between roles, and that will
come at a certain cost (currently unknown). However, that will need to
be confirmed experimentally, and starting 'simple' with a per-task API
would definitely be a step in the right direction.

Thanks,
Quentin