[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADDUTFzK0FNS_mJ=S2_FH2vS2c5a+gW_qsjf3Hb9k=zzjB4JmA@mail.gmail.com>
Date: Mon, 9 Dec 2024 09:10:35 +0200
From: Costa Shulyupin <costa.shul@...hat.com>
To: Thomas Gleixner <tglx@...utronix.de>, Waiman Long <longman@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>, Valentin Schneider <vschneid@...hat.com>,
Peter Zijlstra <peterz@...radead.org>
Cc: open list <linux-kernel@...r.kernel.org>
Subject: Interference of CPU hotplug on CPU isolation and Real-Time tasks
Hello
Simplified test:
rtla timerlat hist -c 1 -a 500 &
echo 0 > /sys/devices/system/cpu/cpu11/online
RTLA reveals blocking thread stack trace:
...
-> multi_cpu_stop
-> cpu_stopper_thread
-> smpboot_thread_fn
...
I've found that multi_cpu_stop() disables interrupts for EACH online
CPU because takedown_cpu() indirectly invokes take_cpu_down() through
stop_machine_cpuslocked(). I'm omitting the detailed description of
the call chain.
Potentially using stop_one_cpu() instead of stop_machine_cpuslocked()
could solve the problem:
@@ -1335,7 +1339,7 @@ static int takedown_cpu(unsigned int cpu)
/*
* So now all preempt/rcu users must observe !cpu_active().
*/
- err = stop_machine_cpuslocked(take_cpu_down, NULL, cpumask_of(cpu));
+ err = stop_one_cpu(cpu, take_cpu_down, NULL);
Original stop_machine code was introduced 20 years ago:
Author: rusty <rusty>
Date: Fri Mar 19 16:02:28 2004 +0000
[PATCH] Hotplug CPUs: cpu_down()
Implement cpu_down(): uses stop_machine to freeze the machine, then
uses (arch-specific) __cpu_disable() and migrate_all_tasks().
Whole thing under CONFIG_HOTPLUG_CPU, so doesn't break archs which
don't define that.
https://github.com/jeffmahoney/linux-pre-git/commit/864a81b15223552102124656a012ac6de6947499#diff-52e4b09f63a029f319f95a60ddc0a09c31de0e172f8a2802ce39294569e60587R122
Additionally, take_cpu_down() relies on local_irq_save() and
hard_irq_disable(). However, I am omitting this patch to concentrate
solely on stop_one_cpu().
Questions:
1. Why stop_machine() is used during the CPU hotplug?
2. Is it worth testing using stop_one_cpu(), or would that be the
wrong approach?
3. Do you have any additional recommendations?
Thanks
Costa
Powered by blists - more mailing lists