[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e6153b89-1f41-3fff-241b-a767e41a1e7e@quicinc.com>
Date: Fri, 23 Sep 2022 19:50:04 +0530
From: Mukesh Ojha <quic_mojha@...cinc.com>
To: Peter Zijlstra <peterz@...radead.org>,
Waiman Long <longman@...hat.com>
CC: Tejun Heo <tj@...nel.org>,
Jing-Ting Wu <jing-ting.wu@...iatek.com>,
Valentin Schneider <vschneid@...hat.com>,
<wsd_upstream@...iatek.com>, <linux-kernel@...r.kernel.org>,
<linux-arm-kernel@...ts.infradead.org>,
<linux-mediatek@...ts.infradead.org>,
<Jonathan.JMChen@...iatek.com>,
"chris.redpath@....com" <chris.redpath@....com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Vincent Donnefort <vdonnefort@...il.com>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Christian Brauner <brauner@...nel.org>,
<cgroups@...r.kernel.org>, <lixiong.liu@...iatek.com>,
<wenju.xu@...iatek.com>
Subject: Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete
Hi Peter,
On 9/7/2022 2:20 AM, Peter Zijlstra wrote:
> On Tue, Sep 06, 2022 at 04:40:03PM -0400, Waiman Long wrote:
>
> I've not followed the earlier stuff due to being unreadable; just
> reacting to this..
We are able to reproduce this issue explained at this link
https://lore.kernel.org/lkml/88b2910181bda955ac46011b695c53f7da39ac47.camel@mediatek.com/
>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 838623b68031..5d9ea1553ec0 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -2794,9 +2794,9 @@ static int __set_cpus_allowed_ptr_locked(struct
>> task_struct *p,
>> if (cpumask_equal(&p->cpus_mask, new_mask))
>> goto out;
>>
>> - if (WARN_ON_ONCE(p == current &&
>> - is_migration_disabled(p) &&
>> - !cpumask_test_cpu(task_cpu(p), new_mask)))
>> {
>> + if (is_migration_disabled(p) &&
>> + !cpumask_test_cpu(task_cpu(p), new_mask)) {
>> + WARN_ON_ONCE(p == current);
>> ret = -EBUSY;
>> goto out;
>> }
>> @@ -2818,7 +2818,11 @@ static int __set_cpus_allowed_ptr_locked(struct
>> task_struct *p,
>> if (flags & SCA_USER)
>> user_mask = clear_user_cpus_ptr(p);
>>
>> - ret = affine_move_task(rq, p, rf, dest_cpu, flags);
>> + if (!is_migration_disabled(p) || (flags & SCA_MIGRATE_ENABLE)) {
>> + ret = affine_move_task(rq, p, rf, dest_cpu, flags);
>> + } else {
>> + task_rq_unlock(rq, p, rf);
>> + }
>
> This cannot be right. There might be previous set_cpus_allowed_ptr()
> callers that are blocked and waiting for the task to land on a valid
> CPU.
>
Was thinking if just skipping as below will help here, well i am not sure .
But thinking what if we keep the task as it is on the same cpu and let's
wait for migration to be enabled for the task to take care of it later.
------------------->O------------------------------------------
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d90d37c..7717733 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2390,8 +2390,10 @@ static int migration_cpu_stop(void *data)
* we're holding p->pi_lock.
*/
if (task_rq(p) == rq) {
- if (is_migration_disabled(p))
+ if (is_migration_disabled(p)) {
+ complete = true;
goto out;
+ }
if (pending) {
-Mukesh
Powered by blists - more mailing lists