[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240515062034.61601-1-lizhe.67@bytedance.com>
Date: Wed, 15 May 2024 14:20:34 +0800
From: lizhe.67@...edance.com
To: lizhe.67@...edance.com
Cc: juri.lelli@...hat.com,
linux-kernel@...r.kernel.org,
mingo@...hat.com,
peterz@...radead.org
Subject: Re: sched/isolation: Fix CPU affinity issues for several task
On Mon, 29 Apr 2024 17:44:27 +0800, Li Zhe wrote:
>If the parameter of cmdline "nohz_full=" contains cpu 0, the cpu affinity
>of the kernel thread "kthreadd", "rcu_sched", "rcuos%d", "rcuog%d" will
>always be 0x01, that is, these threads can only run on cpu 0. This is
>obviously not in line with the original design.
>
>The root cause of this problem is that variables 'cpu_valid_mask' in
>functions __set_cpus_allowed_ptr_locked only contain cpu 0 before smp
>initialization is completed. If we call set_cpus_allowed_ptr and pass in a
>cpumask that does not contain cpu 0, the function call will return failure.
>Thread "kthreadd" and "rcu_sched" call the function set_cpus_allowed_ptr
>early in the system startup. Thread "rcuos%d" and "rcuog%d" inherit the
>wrong cpu affinity of "kthreadd".
>
>I tried to fix this problem by adapting the function set_cpus_allowed_ptr,
>but the variable task_struct->cpus_ptr will be referenced or modified in the
>scheduled process, which seems to make it more difficult to fix this problem
>by adapting the function set_cpus_allowed_ptr. So this patch clear cpu 0 from
>nohz_full range to fix this problem.
>
>Signed-off-by: Li Zhe <lizhe.67@...edance.com>
>---
> kernel/sched/isolation.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
>diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
>index 5891e715f00d..7b9bcfcd3c55 100644
>--- a/kernel/sched/isolation.c
>+++ b/kernel/sched/isolation.c
>@@ -152,6 +152,13 @@ static int __init housekeeping_setup(char *str, unsigned long flags)
> if (cpumask_empty(non_housekeeping_mask))
> goto free_housekeeping_staging;
>
>+ if ((flags & HK_FLAG_KTHREAD) &&
>+ cpumask_test_cpu(smp_processor_id(), non_housekeeping_mask)) {
>+ pr_warn("Housekeeping: Clearing cpu %d from nohz_full range\n", smp_processor_id());
>+ __cpumask_set_cpu(smp_processor_id(), housekeeping_staging);
>+ __cpumask_clear_cpu(smp_processor_id(), non_housekeeping_mask);
>+ }
>+
> if (!housekeeping.flags) {
> /* First setup call ("nohz_full=" or "isolcpus=") */
> enum hk_type type;
Friendly ping. Could somebody give me some advice?
Powered by blists - more mailing lists