linux-kernel - Re: RCU idle CPU detection is broken in linux-next

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <505AC979.7000008@gmail.com>
Date:	Thu, 20 Sep 2012 09:44:57 +0200
From:	Sasha Levin <levinsasha928@...il.com>
To:	Michael Wang <wangyun@...ux.vnet.ibm.com>
CC:	paulmck@...ux.vnet.ibm.com, Dave Jones <davej@...hat.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: RCU idle CPU detection is broken in linux-next

On 09/20/2012 09:33 AM, Michael Wang wrote:
> On 09/20/2012 01:06 AM, Paul E. McKenney wrote:
>> On Wed, Sep 19, 2012 at 06:35:36PM +0200, Sasha Levin wrote:
>>> On 09/19/2012 05:39 PM, Paul E. McKenney wrote:
>>>> On Wed, Sep 12, 2012 at 07:56:48PM +0200, Sasha Levin wrote:
>>>>>> Hi Paul,
>>>>>>
>>>>>> While fuzzing using trinity inside a KVM tools guest, I've managed to trigger
>>>>>> "RCU used illegally from idle CPU!" warnings several times.
>>>>>>
>>>>>> There are a bunch of traces which seem to pop exactly at the same time and from
>>>>>> different places around the kernel. Here are several of them:
>>>> Hello, Sasha,
>>>>
>>>> OK, interesting.  Could you please try reproducing with the diagnostic
>>>> patch shown below?
>>>
>>> Sure - here are the results (btw, it reproduces very easily):
>>>
>>> [ 13.525119] ================================================
>>> [ 13.527165] [ BUG: lock held when returning to user space! ]
>>> [ 13.528752] 3.6.0-rc6-next-20120918-sasha-00002-g190c311-dirty #362 Tainted: GW
>>> [ 13.531314] ------------------------------------------------
>>> [ 13.532918] init/1 is leaving the kernel with locks still held!
>>> [ 13.534574] 1 lock held by init/1:
>>> [ 13.535533] #0: (rcu_idle){.+.+..}, at: [<ffffffff811c36d0>]
>>> rcu_eqs_enter_common+0x1a0/0x9a0
>>>
>>> I'm basically seeing lots of the above, so I can't even get to the point where I
>>> get the previous lockdep warnings.
>>
>> OK, that diagnostic patch was unhelpful.  Back to the drawing board...
> 
> May be we could first make sure the cpu_idle() behave properly?
> 
> Since according to the log, rcu think cpu is idle while current pid
> is not 0, that could happen if things broken in cpu_idle() which
> is very dependent on platform.
> 
> So check it when idle thread was switched out may could be the first
> step? some thing like below.
> 
> Regards,
> Michael Wang
> 
> diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
> index b6baf37..f8c7354 100644
> --- a/kernel/sched/idle_task.c
> +++ b/kernel/sched/idle_task.c
> @@ -43,6 +43,7 @@ dequeue_task_idle(struct rq *rq, struct task_struct *p, int flags)
>  
>  static void put_prev_task_idle(struct rq *rq, struct task_struct *prev)
>  {
> +       WARN_ON(rcu_is_cpu_idle());
>  }
>  
>  static void task_tick_idle(struct rq *rq, struct task_struct *curr, int queued)

Looks like you're on to something, with the small patch above applied:

[   23.514223] ------------[ cut here ]------------
[   23.515496] WARNING: at kernel/sched/idle_task.c:46
put_prev_task_idle+0x1e/0x30()
[   23.517498] Pid: 0, comm: swapper/0 Tainted: G        W
3.6.0-rc6-next-20120919-sasha-00001-gb54aafe-dirty #366
[   23.520393] Call Trace:
[   23.521882]  [<ffffffff8115167e>] ? put_prev_task_idle+0x1e/0x30
[   23.524220]  [<ffffffff81106736>] warn_slowpath_common+0x86/0xb0
[   23.524220]  [<ffffffff81106825>] warn_slowpath_null+0x15/0x20
[   23.524220]  [<ffffffff8115167e>] put_prev_task_idle+0x1e/0x30
[   23.524220]  [<ffffffff839ea61e>] __schedule+0x25e/0x8f0
[   23.524220]  [<ffffffff81175ebd>] ? tick_nohz_idle_exit+0x18d/0x1c0
[   23.524220]  [<ffffffff839ead05>] schedule+0x55/0x60
[   23.524220]  [<ffffffff81078540>] cpu_idle+0x90/0x160
[   23.524220]  [<ffffffff8383043c>] rest_init+0x130/0x144
[   23.524220]  [<ffffffff8383030c>] ? csum_partial_copy_generic+0x16c/0x16c
[   23.524220]  [<ffffffff858acc18>] start_kernel+0x38d/0x39a
[   23.524220]  [<ffffffff858ac5fe>] ? repair_env_string+0x5e/0x5e
[   23.524220]  [<ffffffff858ac326>] x86_64_start_reservations+0x101/0x105
[   23.524220]  [<ffffffff858ac472>] x86_64_start_kernel+0x148/0x157
[   23.524220] ---[ end trace 2c3061ab727afec2 ]---

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/