[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aBoyBbDDslnly60w@localhost.localdomain>
Date: Tue, 6 May 2025 18:00:05 +0200
From: Frederic Weisbecker <frederic@...nel.org>
To: Gabriele Monaco <gmonaco@...hat.com>
Cc: linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
Waiman Long <longman@...hat.com>
Subject: Re: [PATCH v4 5/5] timers: Exclude isolated cpus from timer migation
Le Tue, May 06, 2025 at 11:15:40AM +0200, Gabriele Monaco a écrit :
> @@ -1472,15 +1473,24 @@ static int tmigr_cpu_unavailable(unsigned int cpu)
>
> static int tmigr_cpu_available(unsigned int cpu)
> {
> - struct tmigr_cpu *tmc = this_cpu_ptr(&tmigr_cpu);
> + struct tmigr_cpu *tmc = per_cpu_ptr(&tmigr_cpu, cpu);
>
> /* Check whether CPU data was successfully initialized */
> if (WARN_ON_ONCE(!tmc->tmgroup))
> return -EINVAL;
>
> + /*
> + * Isolated CPUs don't participate in timer migration.
> + * Checking here guarantees that CPUs isolated at boot (e.g. isolcpus)
> + * are not marked as available when they first become online.
> + * During runtime, any offline isolated CPU is also not incorrectly
> + * marked as available once it gets back online.
> + */
> + if (cpu_is_isolated(cpu))
I would like nohz_full to remain an exception here. It already handles
well (even better than domain isolated CPUs) global timers by behaving like idle
CPUs. Because when the tick is stopped on nohz_full, the global timers are
then handled by housekeeping. We are doing something different with domain
isolated CPUs because those must still handle their own global timers.
So please keep nohz_full CPUs inside the tree (that includes CPUs that are
_both_ nohz_full and domain isolated).
> + return 0;
> raw_spin_lock_irq(&tmc->lock);
> trace_tmigr_cpu_available(tmc);
> - tmc->idle = timer_base_is_idle();
> + tmc->idle = timer_base_remote_is_idle(cpu);
This is racy:
CPU 0 CPU 1
----- -----
tick_nohz_stop_tick()
timer_base_try_to_set_idle()
__get_next_timer_interrupt()
tmigr_cpu_deactivate()
tmigr_isolated_exclude_cpumask()
tmigr_cpu_available()
tmc->idle = timer_base_remote_is_idle(cpu);
if (!tmc->idle)
__tmigr_cpu_activate(tmc);
base_global->is_idle = true;
CPU 0 can now become the migrator even when it's idle sleeping forever.
My suggestion is to not rely remotely on is_idle. This can only
be racy. You can trigger tmigr_cpu_available() through smp_call_function_many()
instead.
Thanks.
> if (!tmc->idle)
> __tmigr_cpu_activate(tmc);
> tmc->available = true;
> @@ -1489,6 +1499,21 @@ static int tmigr_cpu_available(unsigned int cpu)
> return 0;
> }
>
> +void tmigr_isolated_exclude_cpumask(cpumask_var_t exclude_cpumask)
> +{
> + int cpu;
> +
> + lockdep_assert_cpus_held();
> +
> + for_each_cpu_and(cpu, exclude_cpumask, tmigr_available_cpumask)
> + tmigr_cpu_unavailable(cpu);
> +
> + for_each_cpu_andnot(cpu, cpu_online_mask, exclude_cpumask) {
> + if (!cpumask_test_cpu(cpu, tmigr_available_cpumask))
> + tmigr_cpu_available(cpu);
> + }
> +}
> +
> static void tmigr_init_group(struct tmigr_group *group, unsigned int lvl,
> int node)
> {
> --
> 2.49.0
>
--
Frederic Weisbecker
SUSE Labs
Powered by blists - more mailing lists