linux-kernel - Scheduler wakeup path tuning surface: Use-Cases and Requirements

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <87imfi2qbk.derkling@matbug.net>
Date:   Tue, 23 Jun 2020 09:29:03 +0200
From:   Patrick Bellasi <patrick.bellasi@...bug.net>
To:     LKML <linux-kernel@...r.kernel.org>
Cc:     Ingo Molnar <mingo@...hat.com>,
        "Peter Zijlstra \(Intel\)" <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Paul Turner <pjt@...gle.com>, Ben Segall <bsegall@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Jonathan Corbet <corbet@....net>,
        Dhaval Giani <dhaval.giani@...cle.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Josef Bacik <jbacik@...com>,
        Chris Hyser <chris.hyser@...cle.com>,
        Parth Shah <parth@...ux.ibm.com>
Subject: Scheduler wakeup path tuning surface: Use-Cases and Requirements


Since last year's OSPM Summit we started conceiving the idea that task
wakeup path could be better tuned for certain classes of workloads
and usage scenarios. Various people showed interest for a possible
tuning interface for the scheduler wakeup path.


.:: The Problem
===============

The discussions we had so far [1] have not been effective in clearly
identifying if a common tuning surface is possible. The last discussion
at this year's OSPM Summit [2,3] was also kind of inconclusive and left
us with the message: start by collecting the requirements and then see
what interface fits them the best.

General consensus is that a unified interface can be challenging and
maybe not feasible. However, generalisation is also a value
and we should strive for it whenever it's possible.

Someone might think that we did not put enough effort in the analysis of
requirements. Perhaps what we missed so far is also a structured and
organised way to collect requirements which also can help in factoring
out the things they have in common.


.:: The Proposal
================

This thread aims at providing a guided template for the description of
different task wakeup use-cases. It does that by setting a series of
questions aimed at precisely describing what's "currently broken", what
we would like to have instead and how we could achieve it.

What we propose here is that, for each wakeup use-case, we start
by replying to this email to provide the required details/comments for
a predefined list of questions. This will generate independent
discussion threads. Each thread will be free to focus on a specific
proposal but still all the thread will be reasoning around a common set
of fundamental concepts.

The hope is that, by representing all the use-cases as sufficiently
detailed responses to a common set of questions, once the discussion
settles down, we can more easily verify if there are common things
surfacing which then can be naturally wrapped into a unified user-space
API.

A first use-case description, following the template guidelines, will
be posted shortly after this message. This also will provide an example
for how to use the template.

NOTE: Whenever required, pseudo-code or simplified C can be used.

I hope this can drive a fruitful discussion in preparation for LPC!

Best,
Patrick


---8<--- For templates submissions: reply only to the following ---8<---


.:: Scheduler Wakeup Path Requirements Collection Template
==========================================================

A) Name: unique one-liner name for the proposed use-case

B) Target behaviour: one paragraph to describe the wakeup path issue

C) Existing control paths: reference to code paths to justify B)

   Assuming v5.6 as the reference kernel, this section should provide
   links to code paths such as, e.g.

   fair.c:3917
   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/fair.c?h=v5.6#n3917

   Alternatively code snippets can be added inline, e.g.

        /*
	 * The 'current' period is already promised to the current tasks,
	 * however the extra weight of the new task will slow them down a
	 * little, place the new task so that it fits in the slot that
	 * stays open at the end.
	 */
	if (initial && sched_feat(START_DEBIT))
		vruntime += sched_vslice(cfs_rq, se);

   NOTE: if the use-case exists only outside the mainline Linux kernel
         this section can stay empty

D) Desired behaviour: one paragraph to describe the desired update

   NOTE: the current mainline expression is assumed to be correct
         for existing use-cases. Thus, here we are looking for run-time
         tuning of those existing features.

E) Existing knobs (if any): reference to whatever existing tunable

   Some features can already be tuned, but perhaps only via compile time
   knobs, SCHED_FEATs or system wide tunable.
   If that's the case, we should document them here and explain how they
   currently work and what are (if any) the implicit assumptions, e.g.
   what is the currently implemented scheduling policy/heuristic.

F) Existing knobs (if any): one paragraph description of the limitations

   If the existing knobs are not up to the job for this use-case,
   shortly explain here why. It could be because a tuning surface is
   already there but it's hardcoded (e.g. compile time knob) or too
   coarse grained (e.g. a SCHED_FEAT).

G) Proportionality Analysis: check the nature of the target behavior

   Goal here is to verify and discuss if the behaviour (B) has a
   proportional nature: different values of the control knobs (E) are
   expected to produce different effects for (B).
   
   Special care should be taken to check if the target behaviour has an
   intrinsically "binary nature", i.e. only two values make really
   sense. In this case it would be very useful to argument why a
   generalisation towards a non-binary behaviours does NOT make sense.

H) Range Analysis: identify meaningful ranges

   If (G) was successfully, i.e. there is a proportional correlation
   between (E) and (B), discuss here about a meaningful range for (E)
   and (F).

I) System-Wide tuning: which knobs are required

   If required, list new additional tunables here, how they should be
   exposed and (if required) which scheduling classes will be affected.

J) Per-Task tuning: which knobs are required

   Describe which knobs should be added and which task specific API
   (e.g. sched_setscheduler(), prctl(), ...) they should be used.

K) Task-Group tuning: which knobs are required

   If the use-case can benefit from a task-group tuning, here it should
   **briefly described** how the expected behaviour can be mapped on a
   cgroup v2 unified hierarchy.

   NOTE: implementation details are not required but we should be able
         to hint at which cgroup v2 resource distribution model [5]
         should be applied.


---8<--- For templates submissions: exclude the following ---8<---


.:: References
==============

[1] [Discussion v2] Usecases for the per-task latency-nice attribute
    Message-ID: 2bd46086-43ff-f130-8720-8eec694eb55b@...ux.ibm.com
    https://lore.kernel.org/lkml/2bd46086-43ff-f130-8720-8eec694eb55b@linux.ibm.com

[2] Latency Nice: Implementation and UseCases for Scheduler Optmizations
    https://ospm.lwn.net/playback/presentation/2.0/playback.html?meetingId=380dd88f044f67ee4c94d0a2a4fb7c3f46cb6391-1589459486615&t=42m37s

[3] LWN: The many faces of "latency nice"
    https://lwn.net/Articles/820659

[4] [PATCH v5 0/4] Introduce per-task latency_nice for scheduler hints
    Message-ID: 20200228090755.22829-1-parth@...ux.ibm.com
    https://lore.kernel.org/lkml/20200228090755.22829-1-parth@linux.ibm.com

[5] Control Group v2: Resource Distribution Models
    https://www.kernel.org/doc/Documentation/admin-guide/cgroup-v2.rst