linux-kernel - Re: Stopping the tick on a fully loaded system

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZMD5xyxPUkKCDlVQ@localhost.localdomain>
Date:   Wed, 26 Jul 2023 12:47:35 +0200
From:   Frederic Weisbecker <frederic@...nel.org>
To:     Anna-Maria Behnsen <anna-maria@...utronix.de>
Cc:     "Rafael J. Wysocki" <rafael@...nel.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
        "Gautham R. Shenoy" <gautham.shenoy@....com>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>
Subject: Re: Stopping the tick on a fully loaded system

Le Tue, Jul 25, 2023 at 03:07:05PM +0200, Anna-Maria Behnsen a écrit :
> The worst case scenario will not happen, because remote timer expiry only
> happens when CPU is not active in the hierarchy. And with your proposal
> this is valid after tick_nohz_stop_tick().
> 
> Nevertheless, I see some problems with this. But this also depends if there
> is the need to change current idle behavior or not. Right now, this are my
> concerns:
> 
> - The determinism of tick_nohz_next_event() will break: The return of
>   tick_nohz_next_event() will not take into account, if it is the last CPU
>   going idle and then has to take care of remote timers. So the first timer
>   of the CPU (regardless of global or local) has to be handed back even if
>   it could be handled by the hierarchy.

Bah, of course...

> 
> - When moving the tmigr_cpu_deactivate() to tick_nohz_stop_tick() and the
>   return value of tmigr_cpu_deactivate() is before the ts->next_tick, the
>   expiry has to be modified in tick_nohz_stop_tick().
> 
> - The load is simply moved to a later place - tick_nohz_stop_tick() is
>   never called without a preceding tick_nohz_next_event() call. Yes,
>   tick_nohz_next_event() is called under load ~8% more than
>   tick_nohz_stop_tick(), but the 'quality' of the return value of
>   tick_nohz_next_event() is getting worse.
> 
> - timer migration hierarchy is not a standalone timer infrastructure. It
>   only makes sense to handle it in combination with the existing timer
>   wheel. When the timer base is idle, the timer migration hierarchy with
>   the migrators will do the job for global timers. So, I'm not sure about
>   the impact of the changed locking - but I'm pretty sure changing that
>   increases the probability for ugly races hidden somewhere between the
>   lines.

Sure thing, and this won't be pretty.

> 
> Thanks,
> 
> 	Anna-Maria