linux-kernel - Re: [RFC/RFT][PATCH 6/7] sched: idle: Predict idle duration before stopping the tick

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3c051023-312b-d2bc-34b5-40e8c13ca796@tu-dresden.de>
Date:   Mon, 5 Mar 2018 16:36:20 +0100
From:   Thomas Ilsche <thomas.ilsche@...dresden.de>
To:     "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Peter Zijlstra <peterz@...radead.org>
CC:     Thomas Gleixner <tglx@...utronix.de>,
        Frederic Weisbecker <fweisbec@...il.com>,
        Paul McKenney <paulmck@...ux.vnet.ibm.com>,
        "Doug Smythies" <dsmythies@...us.net>,
        Rik van Riel <riel@...riel.com>,
        Aubrey Li <aubrey.li@...ux.intel.com>,
        Mike Galbraith <mgalbraith@...e.de>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux PM <linux-pm@...r.kernel.org>
Subject: Re: [RFC/RFT][PATCH 6/7] sched: idle: Predict idle duration before
 stopping the tick

On 2018-03-04 23:28, Rafael J. Wysocki wrote:
> use the expected idle period
> duration returned by cpuidle_select() to tell tick_nohz_idle_go_idle()
> whether or not to stop the tick.

I assume that at the point of going idle, the actual next scheduling
tick may happen anywhere between now and 1/HZ. If there is a mechanism
that somehow ensures that the next scheduling tick always happens 1/HZ
after going idle, then some of my arguments are invalid.

Ideally, the decision whether to disable the sched tick should
primarily depend on the order of tree upcoming events: the the sched
tick, the next non-sched timer, and the heuristic prediction:

   https://marc.info/?l=linux-pm&m=151384941425947&w=2

If I read the code correctly, there is already logic deep within
__tick_nohz_idle_enter that prevents disabling the sched tick when
it is scheduled to happen after another timer, which is a good primary
condition for not stopping the sched tick. However the newly added
condition prevents stopping the sched tick in more cases where it is
undesirable.
Assume duration_us is slightly less than USEC_PER_SEC / HZ.
and next sched tick will happen in 0.1 * USEC_PER_SEC / HZ
If the prediction was accurate, the cpu will be woken up way too soon
by the not-disabled sched tick.

I fear that might even create positive feedback loops on the
heuristic, which will take into account the sleep durations for
sched tick wakeups in sort of a self fulfilling prophecy:
1) The heuristic predicts to wake up in less than a full sched period,
2) The sched tick is kept enabled
3) The sched tick wakes up the system in less than a full sched period
4) Repeat

Even when sleeping for longer than target_residency of the deepest
sleep state, you can still improve energy consumption by sleeping
longer whenever possible.

On the opposite side - undesirable shallow sleeps - the proposed patch
will basically always keep the tick enabled if there is a higher sleep
state with a target_residency <= 1/HZ. On systems with relatively low
target_residencies, such as the ones that I am primarily
investigating, this should effectively prevent long shallow sleeps.
However, on mobile systems with C10 states > 5 ms the sched tick is
not a suitable fallback timer for preventing these issues. Well, maybe
the timer itself could be used, but with a larger expiry.

So IMHO
- the precise timer and vague heuristic should not be mixed
- decisions should preferably use actual time points rather than the
   generic tick duration and residency time.
- for some cases the sched tick as is may not be sufficient as fallback

Question: Does disabling a timer on a cpu guarantee that this cpu will
wake-up or is there a scenario where a timer is deleted or moved
externally without the cpu having a chance to change it's idle state?