lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <59D84A4B.70900@huawei.com>
Date:   Sat, 7 Oct 2017 11:30:19 +0800
From:   zhouchengming <zhouchengming1@...wei.com>
To:     Steven Rostedt <rostedt@...dmis.org>, <peterz@...radead.org>
CC:     <linux-kernel@...r.kernel.org>, <mingo@...hat.com>,
        <huawei.libin@...wei.com>
Subject: Re: [PATCH] sched/rt.c: pick and check task if double_lock_balance()
 unlock the rq

Hi Steven, Peter,

On 2017/9/26 11:18, Steven Rostedt wrote:
> On Tue, 26 Sep 2017 09:23:20 +0800
> zhouchengming<zhouchengming1@...wei.com>  wrote:
>
>> On 2017/9/26 3:40, Steven Rostedt wrote:
>>> On Mon, 11 Sep 2017 14:51:49 +0800
>>> Zhou Chengming<zhouchengming1@...wei.com>   wrote:
>>>
>>>> push_rt_task() pick the first pushable task and find an eligible
>>>> lowest_rq, then double_lock_balance(rq, lowest_rq). So if
>>>> double_lock_balance() unlock the rq (when double_lock_balance() return 1),
>>>> we have to check if this task is still on the rq.
>>>>
>>>> The problem is that the check conditions are not sufficient:
>>>>
>>>> if (unlikely(task_rq(task) != rq ||
>>>> 	     !cpumask_test_cpu(lowest_rq->cpu,&task->cpus_allowed) ||
>>>> 	     task_running(rq, task) ||
>>>> 	     !rt_task(task) ||
>>>> 	     !task_on_rq_queued(task))) {
>>>>
>>>> cpu2				cpu1			cpu0
>>>> push_rt_task(rq1)
>>>>     pick task_A on rq1
>>>>     find rq0
>>>>       double_lock_balance(rq1, rq0)
>>>>         unlock(rq1)
>>>> 				rq1 __schedule
>>>> 				  pick task_A run
>>>> 				task_A sleep (dequeued)
>>>>         lock(rq0)
>>>>         lock(rq1)
>>>>       do_above_check(task_A)
>>>>         task_rq(task_A) == rq1
>>>>         cpus_allowed unchanged
>>>>         task_running == false
>>>>         rt_task(task_A) == true
>>>> 							try_to_wake_up(task_A)
>>>> 							  select_cpu = cpu3
>>>> 							  enqueue(rq3, task_A)
>>> How can this happen? The try_to_wake_up(task_A) needs to grab the rq
>>> that task A is on, and we have that rq lock.
>>>
>>> /me confused.
>>>
>>> -- Steve
>> Thanks for the reply!
>> After the task_A sleep on cpu1, the try_to_wake_up(task_A) on cpu0 select a different cpu3,
>> so it will grab the rq3 lock, not the rq1 lock.
> Ah crap. This is caused by 7608dec2ce20 ("sched: Drop the rq argument
> to sched_class::select_task_rq()"). Because this code depends on
> try_to_wake_up() grabbing the task's rq lock. But it no longer does
> that, and it causes this race.
>
> OK, I need to look at this deeper when I'm not so jetlagged and typing
> this because I can't sleep at 5am.
>
> Thanks for pointing this out!
>
> It may be fixed by simply grabbing the run queue lock on migration, as
> that would sync things up.

Is there any new solution? I don't think grabbing the rq lock without the task->pi_lock
will fix this problem. And I think my patch is correct and the changes are small.

Thanks!

> Peter?
>
>
> -- Steve
>
>
>
> .
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ