[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0e69b462-1666-4a17-bde3-758d0d81a7d5@gmail.com>
Date: Tue, 14 Jan 2025 15:01:04 -0800
From: Doug Berger <opendmb@...il.com>
To: Juri Lelli <juri.lelli@...hat.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
Florian Fainelli <florian.fainelli@...adcom.com>,
linux-kernel@...r.kernel.org
Subject: Re: [RFC] sched/deadline: only mark active cpu as free
On 1/13/2025 2:54 AM, Juri Lelli wrote:
> Hi Doug,
>
> On 10/01/25 15:30, Doug Berger wrote:
>> There is a hazard in the deadline scheduler where an offlined CPU
>> can have its free_cpus bit left set in the def_root_domain when
>> the schedutil cpufreq governor is used. This can allow a deadline
>> thread to be pushed to the runqueue of a powered down CPU which
>> breaks scheduling.
>>
>> This commit works around the issue by only setting the free_cpus
>> bit for a CPU when it is "active". It is likely that the ordering
>> of sched_set_rq_online() and set_cpu_active() at the end of the
>> sched_cpu_deactivate() function should be revisited if this
>> approach has merit.
>>
>> Signed-off-by: Doug Berger <opendmb@...il.com>
>> ---
<snip>
>> This problem appears to have been initially introduced by commit
>> 120455c514f7 ("sched: Fix hotplug vs CPU bandwidth control") which
>> moved the set_rq_offline() handling from sched_cpu_dying() to
>> sched_cpu_deactivate(). The original sequence allowed the free_cpus
>> bit to be forcibly cleared in the def_root_domains after all of the
>> scheduler dust settled. The new location makes the
>> sched_set_rq_offline() essentially meaningless for the deadline
>> scheduler since the managing of changed scheduling domains happens
>> later.
>>
>> There are likely many different approaches to address this issue
>> and I'm hopeful that somone more familiar with the scheduler than
>> I can propose a better solution than the one suggested here.
>>
>> Thank you for reading this far. Any advice is appreciated.
>
> Thanks a lot for the detailed analysis!
Thanks for the review!
>
> I actually fear that the issue is due to the cpudl_clear_freecpu() call
> in rq_offline_dl() being racy, as we don't hold cp->lock while calling
> that. So, I think your solution below might be almost correct. I am
> thinking we should do something similar in cpudl_set() and remove cpudl_
> {set,clear}_freecpu() calls altogether.
>
> What do you think? If agree, care to update your patch please? :)
As I noted, I'm not real keen on what I proposed here. In particular I
am wary of introducing any extra processing to normal deadline scheduler
execution paths.
I am about to submit an alternative that only affects paths when the
topology is changing that I hope you will take another look at ;).
I am still very much open to someone saying something like "You know we
could probably bring the def_root_domain of the onlining CPU online
earlier so that set_rq_offline() gets called for it when attaching to
the new scheduling domain." I just don't feel comfortable enough yet
with my understanding of the scheduler to know what all of the
implications of such a change would be.
>
> Best,
> Juri
Thanks again!
Doug
>
>> kernel/sched/cpudeadline.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
>> index 95baa12a1029..6896bbe0e9ae 100644
>> --- a/kernel/sched/cpudeadline.c
>> +++ b/kernel/sched/cpudeadline.c
>> @@ -195,7 +195,8 @@ void cpudl_clear(struct cpudl *cp, int cpu)
>> cp->elements[cpu].idx = IDX_INVALID;
>> cpudl_heapify(cp, old_idx);
>>
>> - cpumask_set_cpu(cpu, cp->free_cpus);
>> + if (cpu_active(cpu))
>> + cpumask_set_cpu(cpu, cp->free_cpus);
>> }
>> raw_spin_unlock_irqrestore(&cp->lock, flags);
>> }
>> --
>> 2.34.1
>>
>
Powered by blists - more mailing lists