[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <cdb597d4-6543-3e34-cbbd-6a776b0d6581@quicinc.com>
Date: Thu, 29 Sep 2022 20:43:43 +0530
From: Mukesh Ojha <quic_mojha@...cinc.com>
To: Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Steven Rostedt <rostedt@...dmis.org>
CC: Tejun Heo <tj@...nel.org>,
Jing-Ting Wu <jing-ting.wu@...iatek.com>,
Valentin Schneider <vschneid@...hat.com>,
<wsd_upstream@...iatek.com>, <linux-kernel@...r.kernel.org>,
<linux-arm-kernel@...ts.infradead.org>,
<linux-mediatek@...ts.infradead.org>,
<Jonathan.JMChen@...iatek.com>,
"chris.redpath@....com" <chris.redpath@....com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Vincent Donnefort <vdonnefort@...il.com>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Christian Brauner <brauner@...nel.org>,
<cgroups@...r.kernel.org>, <lixiong.liu@...iatek.com>,
<wenju.xu@...iatek.com>
Subject: Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete
Hi All,
On 9/23/2022 7:50 PM, Mukesh Ojha wrote:
> Hi Peter,
>
>
> On 9/7/2022 2:20 AM, Peter Zijlstra wrote:
>> On Tue, Sep 06, 2022 at 04:40:03PM -0400, Waiman Long wrote:
>>
>> I've not followed the earlier stuff due to being unreadable; just
>> reacting to this..
>
> We are able to reproduce this issue explained at this link
>
> https://lore.kernel.org/lkml/88b2910181bda955ac46011b695c53f7da39ac47.camel@mediatek.com/
>
>
>
>>
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index 838623b68031..5d9ea1553ec0 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -2794,9 +2794,9 @@ static int __set_cpus_allowed_ptr_locked(struct
>>> task_struct *p,
>>> if (cpumask_equal(&p->cpus_mask, new_mask))
>>> goto out;
>>>
>>> - if (WARN_ON_ONCE(p == current &&
>>> - is_migration_disabled(p) &&
>>> - !cpumask_test_cpu(task_cpu(p),
>>> new_mask)))
>>> {
>>> + if (is_migration_disabled(p) &&
>>> + !cpumask_test_cpu(task_cpu(p), new_mask)) {
>>> + WARN_ON_ONCE(p == current);
>>> ret = -EBUSY;
>>> goto out;
>>> }
>>> @@ -2818,7 +2818,11 @@ static int __set_cpus_allowed_ptr_locked(struct
>>> task_struct *p,
>>> if (flags & SCA_USER)
>>> user_mask = clear_user_cpus_ptr(p);
>>>
>>> - ret = affine_move_task(rq, p, rf, dest_cpu, flags);
>>> + if (!is_migration_disabled(p) || (flags & SCA_MIGRATE_ENABLE)) {
>>> + ret = affine_move_task(rq, p, rf, dest_cpu, flags);
>>> + } else {
>>> + task_rq_unlock(rq, p, rf);
>>> + }
>>
>> This cannot be right. There might be previous set_cpus_allowed_ptr()
>> callers that are blocked and waiting for the task to land on a valid
>> CPU.
>>
>
> Was thinking if just skipping as below will help here, well i am not sure .
>
> But thinking what if we keep the task as it is on the same cpu and let's
> wait for migration to be enabled for the task to take care of it later.
>
> ------------------->O------------------------------------------
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index d90d37c..7717733 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2390,8 +2390,10 @@ static int migration_cpu_stop(void *data)
> * we're holding p->pi_lock.
> */
> if (task_rq(p) == rq) {
> - if (is_migration_disabled(p))
> + if (is_migration_disabled(p)) {
> + complete = true;
> goto out;
> + }
>
> if (pending) {
>
Any suggestion on this bug ?
-Mukesh
Powered by blists - more mailing lists