[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <53270C01.6090209@linux.vnet.ibm.com>
Date: Mon, 17 Mar 2014 10:51:45 -0400
From: "Jason J. Herne" <jjherne@...ux.vnet.ibm.com>
To: Peter Zijlstra <peterz@...radead.org>, Tejun Heo <tj@...nel.org>
CC: Lai Jiangshan <laijs@...fujitsu.com>, linux-kernel@...r.kernel.org,
Ingo Molnar <mingo@...hat.com>
Subject: Re: Subject: Warning in workqueue.c
On 03/10/2014 10:37 AM, Jason J. Herne wrote:
> On 02/25/2014 05:37 AM, Peter Zijlstra wrote:
>> On Mon, Feb 24, 2014 at 01:35:01PM -0500, Tejun Heo wrote:
>>
>>> That's a bummer but it at least isn't a very new regression. Peter,
>>> any ideas on debugging this? I can make workqueue to play block /
>>> unblock dance to try to work around the issue but that'd be very
>>> yucky. It'd be great to root cause where the cpu selection anomaly is
>>> coming from.
>>
>> I'm assuming you're using set_cpus_allowed_ptr() to flip them between
>> CPUs; the below adds some error paths to that code. In particular we
>> propagate the __migrate_task() fail (returns the number of tasks
>> migrated) through the stop_one_cpu() into set_cpus_allowed_ptr().
>>
>> This way we can see if there was a problem with the migration.
>>
>> You should be able to now reliably use the return value of
>> set_cpus_allowed_ptr() to tell if it is now running on a CPU in its
>> allowed mask.
>>
>> I've also included an #if 0 retry loop for the fail case; but I suspect
>> that that might end up deadlocking your machine if you hit that just
>> wrong, something like the waking CPU endlessly trying to migrate the
>> task over while the wakee CPU is waiting for completion of something
>> from the waking CPU.
>>
>> But its worth a prod I suppose.
>>
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 84b23cec0aeb..4c384efac8b3 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -4554,18 +4554,28 @@ int set_cpus_allowed_ptr(struct task_struct
>> *p, const struct cpumask *new_mask)
>>
>> do_set_cpus_allowed(p, new_mask);
>>
>> +again: __maybe_unused
>> +
>> /* Can the task run on the task's current CPU? If so, we're done */
>> - if (cpumask_test_cpu(task_cpu(p), new_mask))
>> + if (cpumask_test_cpu(task_cpu(p), tsk_cpus_allowed(p)))
>> goto out;
>>
>> - dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
>> + dest_cpu = cpumask_any_and(cpu_active_mask, tsk_cpus_allowed(p));
>> if (p->on_rq) {
>> struct migration_arg arg = { p, dest_cpu };
>> +
>> /* Need help from migration thread: drop lock and wait. */
>> task_rq_unlock(rq, p, &flags);
>> - stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
>> + ret = stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
>> +#if 0
>> + if (ret) {
>> + rq = task_rq_lock(p, &flags);
>> + goto again;
>> + }
>> +#endif
>> tlb_migrate_finish(p->mm);
>> - return 0;
>> +
>> + return ret;
>> }
>> out:
>> task_rq_unlock(rq, p, &flags);
>> @@ -4679,15 +4689,18 @@ void sched_setnuma(struct task_struct *p, int
>> nid)
>> static int migration_cpu_stop(void *data)
>> {
>> struct migration_arg *arg = data;
>> + int ret = 0;
>>
>> /*
>> * The original target cpu might have gone down and we might
>> * be on another cpu but it doesn't matter.
>> */
>> local_irq_disable();
>> - __migrate_task(arg->task, raw_smp_processor_id(), arg->dest_cpu);
>> + if (!__migrate_task(arg->task, raw_smp_processor_id(),
>> arg->dest_cpu))
>> + ret = -EAGAIN;
>> local_irq_enable();
>> - return 0;
>> +
>> + return ret;
>> }
>>
>> #ifdef CONFIG_HOTPLUG_CPU
>>
>
> Peter,
>
> Did you intend for me to run with this patch or was it posted for
> discussion only? If you want it run, please tell me what to look for.
> Also, if I should run this, should I include any other patches, either
> the last one you posted in this thread or any of Tejun's?
>
> Thanks.
>
Ping?
--
-- Jason J. Herne (jjherne@...ux.vnet.ibm.com)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists