lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 28 Mar 2017 07:35:36 +0100
From:   Dietmar Eggemann <dietmar.eggemann@....com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Matt Fleming <matt@...eblueprint.co.uk>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Morten Rasmussen <morten.rasmussen@....com>,
        Juri Lelli <juri.lelli@....com>,
        Patrick Bellasi <patrick.bellasi@....com>
Subject: [RFC PATCH 0/5] CFS load tracking trace events

This patch-set introduces trace events for load (and utilization)
tracking for the following three cfs scheduler bricks: cfs_rq,
sched_entity and task_group.

I've decided to sent it out because people are discussing problems with
PELT related functionality and as a base for the related discussion next
week at the OSPM-summit in Pisa.

The requirements for the design are:

 (1) Support for all combinations related to the following kernel
     configuration options: CONFIG_SMP, CONFIG_FAIR_GROUP_SCHED,
     CONFIG_SCHED_AUTOGROUP, CONFIG_DEBUG_KERNEL.

 (2) All key=value pairs have to appear in all combinations of the
     kernel configuration options mentioned under (1).

 (3) Stable interface, i.e. only use keys for a key=value pairs which
     are independent from the underlying (load tracking) implementation.

 (4) Minimal invasive to the actual cfs scheduler code, i.e. the trace
     event should not have to be guarded by an if condition (e.g.
     entity_is_task(se) to distinguish between task and task_group)
     or kernel configuration switch.

The trace events only expose one load (and one utilization) key=value
pair besides those used to identify the cfs scheduler brick. They do
not provide any internal implementation details of the actual (PELT)
load-tracking mechanism.
This limitation might be too much since for a cfs_rq we can only trace
runnable load (cfs_rq->runnable_load_avg) or runnable/blocked load
(cfs_rq->avg.load_avg).
Other load metrics like instantaneous load ({cfs_rq|se}->load.weight)
are not traced but could be easily added.

The following keys are used to identify the cfs scheduler brick:

 (1) Cpu number the cfs scheduler brick is attached to.

 (2) Task_group path and (css) id.

 (3) Task name and pid.

In case a key does not apply due to an unset Kernel configuration option
or the fact that a sched_entity can represent either a task or a
task_group its value is set to an invalid default:

 (1) Task_group path: "(null)"

 (2) Task group id:   -1

 (3) Task name:       "(null)"

 (4) Task pid:        -1

Load tracking trace events are placed into kernel/sched/fair.c:

 (1) for cfs_rq:

     - In PELT core function __update_load_avg().
    
     - During sched_entity attach/detach.

     - During sched_entity load/util propagation down the task_group
       hierarchy.

 (2) for sched_entity:

     - In PELT core function __update_load_avg().
    
     - During sched_entity load/util propagation down the task_group
       hierarchy.

 (3) for task_group:

     - In its PELT update function.

An alternative for using __update_load_avg() would be to put trace
events into update_cfs_rq_load_avg() for cfs_rq's and into
set_task_rq_fair(), update_load_avg(), sync_entity_load_avg() for
sched_entities. This would also get rid of the if(cfs_rq)/else condition
in __update_load_avg().

These trace events still look a bit fragile.
First of all, this patch-set has to use cfs specific data structures
in the global task scheduler trace file include/trace/events/sched.h.
And second, the accessor-functions (rq_of(), task_of(), etc.) are
private to the cfs scheduler class. In case they would be public these
trace events would be easier to code. That said, group_cfs_rq() is
already made public by this patch-stack.

This version bases on tip/sched/core as of yesterday (bc4278987e38). It
has been compile tested on ~160 configurations via 0day's kbuild test
robot.

Dietmar Eggemann (5):
  sched/autogroup: Define autogroup_path() for !CONFIG_SCHED_DEBUG
  sched/events: Introduce cfs_rq load tracking trace event
  sched/fair: Export group_cfs_rq()
  sched/events: Introduce sched_entity load tracking trace event
  sched/events: Introduce task_group load tracking trace event

 include/linux/sched.h        |  10 +++
 include/trace/events/sched.h | 143 +++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/autogroup.c     |   2 -
 kernel/sched/autogroup.h     |   2 -
 kernel/sched/fair.c          |  26 ++++----
 5 files changed, 167 insertions(+), 16 deletions(-)

-- 
2.11.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ