lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 16 Jun 2014 09:30:36 +0800
From:	Lai Jiangshan <laijs@...fujitsu.com>
To:	<jjherne@...ux.vnet.ibm.com>, Peter Zijlstra <peterz@...radead.org>
CC:	Sasha Levin <sasha.levin@...cle.com>, Tejun Heo <tj@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Dave Jones <davej@...hat.com>, Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: workqueue: WARN at at kernel/workqueue.c:2176

Hi, Peter

Ping...

thanks,
Lai

On 06/10/2014 09:21 AM, Lai Jiangshan wrote:
> On 06/09/2014 10:01 PM, Jason J. Herne wrote:
>> On 06/05/2014 06:54 AM, Lai Jiangshan wrote:
>>> ------------
>>>
>>> Subject: [PATCH] sched: migrate the waking tasks
>>>
>>> Current code skips to migrate the waking task silently when TTWU_QUEUE is enabled.
>>>
>>> When a task is waking, it is pending on the wake_list of the rq, but
>>> it is not on queue (task->on_rq == 0). In this case, set_cpus_allowed_ptr()
>>> and __migrate_task() will not migrate it due to it is not on queue.
>>>
>>> This behavior is incorrect, because the task had been already waken-up, it will
>>> be running on the wrong CPU without correct placement until the next wake-up
>>> or update for cpus_allowed.
>>>
>>> To fix this problem, we need to make the waking tasks on-queue (transfer
>>> the waking tasks to running state) before migrate them.
>>>
>>> Signed-off-by: Lai Jiangshan <laijs@...fujitsu.com>
>>> ---
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index 268a45e..d05a5a1 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -1474,20 +1474,24 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
>>>   }
>>>
>>>   #ifdef CONFIG_SMP
>>> -static void sched_ttwu_pending(void)
>>> +static void sched_ttwu_pending_locked(struct rq *rq)
>>>   {
>>> -    struct rq *rq = this_rq();
>>>       struct llist_node *llist = llist_del_all(&rq->wake_list);
>>>       struct task_struct *p;
>>>
>>> -    raw_spin_lock(&rq->lock);
>>> -
>>>       while (llist) {
>>>           p = llist_entry(llist, struct task_struct, wake_entry);
>>>           llist = llist_next(llist);
>>>           ttwu_do_activate(rq, p, 0);
>>>       }
>>> +}
>>>
>>> +static void sched_ttwu_pending(void)
>>> +{
>>> +    struct rq *rq = this_rq();
>>> +
>>> +    raw_spin_lock(&rq->lock);
>>> +    sched_ttwu_pending_locked(rq);
>>>       raw_spin_unlock(&rq->lock);
>>>   }
>>>
>>> @@ -4530,6 +4534,11 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
>>>           goto out;
>>>
>>>       dest_cpu = cpumask_any_and(cpu_active_mask, new_mask);
>>> +
>>> +    /* Ensure it is on rq for migration if it is waking */
>>> +    if (p->state == TASK_WAKING)
>>> +        sched_ttwu_pending_locked(rq);
>>> +
>>>       if (p->on_rq) {
>>>           struct migration_arg arg = { p, dest_cpu };
>>>           /* Need help from migration thread: drop lock and wait. */
>>> @@ -4576,6 +4585,10 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
>>>       if (!cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
>>>           goto fail;
>>>
>>> +    /* Ensure it is on rq for migration if it is waking */
>>> +    if (p->state == TASK_WAKING)
>>> +        sched_ttwu_pending_locked(rq_src);
>>> +
>>>       /*
>>>        * If we're not on a rq, the next wake-up will ensure we're
>>>        * placed properly.
>>>
>>
>> FYI, this patch appears to fix the problem. I was able to run for 3 days without hitting the warning.
> 
> Thank you for the test. It proves that we found the root cause.
> Your tests are the most important, coding takes the second place, let it go forward step by step.
> 
> Thanks,
> Lai
> 
>>
>> I see that you guys are still discussing the details of the fix. When you decide on a final solution I'm happy to retest. Just be sure to ask :). It is hard to tell what to test with so many patches and code snippets flying around all the time.
>>
>> Happy coding.
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists