linux-kernel - Re: Stopping the tick on a fully loaded system

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZL2Z8InSLmI5GU9L@localhost.localdomain>
Date:   Sun, 23 Jul 2023 23:21:52 +0200
From:   Frederic Weisbecker <frederic@...nel.org>
To:     Anna-Maria Behnsen <anna-maria@...utronix.de>
Cc:     Vincent Guittot <vincent.guittot@...aro.org>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
        "Gautham R. Shenoy" <gautham.shenoy@....com>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        "Rafael J . Wysocki" <rafael@...nel.org>
Subject: Re: Stopping the tick on a fully loaded system

(Adding Rafael in Cc)

Le Thu, Jul 20, 2023 at 03:00:37PM +0200, Anna-Maria Behnsen a écrit :
> I had also a look at teo. It makes things better but does not solve the
> underlying problem that I see here - please correct me if I missed
> something or if I'm simply wrong:
> 
> Yes, the governors have to decide in the end, whether it makes sense to
> stop the tick or not. For this decision, the governors require information
> about the current state of the core and how long nothing has to be done
> propably. At the moment the governors therefore call
> tick_nohz_get_sleep_length(). This checks first whether the tick can be
> stopped. Then it takes into account whether rcu, irq_work, arch_work needs
> the CPU or a timer softirq is pending. If non of this is true, then the
> timers are checked. So tick_nohz_get_sleep_length() isn't only based on
> timers already.

Right but those things (rcu/irq work, etc...) act kind of like timers here
and they should be considered as exceptions.

The timer infrastructure shouldn't take into account the idle activity,
this is really a job for the cpuidle governors.

> The information about the sleep length of the scheduler perspective is
> completely missing in the current existing check for the probable sleep
> length.
> 
> Sure, teo takes scheduler utilization into account directly in the
> governor. But for me it is not comprehensible, why the CPU utilization
> check is done after asking for the possible sleep length where timers are
> taken into account. If the CPU is busy anyway, the information generated by
> tick_nohz_next_event() is irrelevant. And when the CPU is not busy, then it
> makes sense to ask for the sleep length also from a timer perspective.
> 
> When this CPU utilization check is implemented directly inside the
> governor, every governor has to implement it on it's own. So wouldn't it
> make sense to implement a "how utilized is the CPU out of a scheduler
> perspective" in one place and use this as the first check in
> tick_nohz_get_sleep_length()/tick_nohz_next_event()?
> 

Well, beyond that, there might be other situations where the governor may
decide not to stop the tick even if tick_nohz_next_event() says it's possible
to do so. That's the purpose of having that next event as an input among many
others for the cpuidle governors.

As such, calling tmigr_cpu_deactivate() on next tick _evaluation_ time instead of
tick _stop_ time is always going to be problematic.

Can we fix that and call tmigr_cpu_deactivate() from tick_nohz_stop_tick()
instead? This will change a bit the locking scenario because
tick_nohz_stop_tick() doesn't hold the base lock. Is it a problem though?
In the worst case a remote tick happens and handles the earliest timer
for the current CPU while it's between tick_nohz_next_event() and
tick_nohz_stop_tick(), but then the current CPU would just propagate
an earlier deadline than needed. No big deal.

Though I could be overlooking some race or something else making that
not possible of course...

Thanks.