linux-kernel - Re: [RFC BUG] There is a potential bug in "yield

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1341477343.7709.4.camel@twins>
Date:	Thu, 05 Jul 2012 10:35:43 +0200
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	Michael Wang <wangyun@...ux.vnet.ibm.com>
Cc:	LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>
Subject: Re: [RFC BUG] There is a potential bug in "yield_to"

On Thu, 2012-07-05 at 13:31 +0800, Michael Wang wrote:
> Hi, All
> 
> I found there may be a potential bug in "yield_to":
> 
>         local_irq_save(flags);
>         rq = this_rq();
> 
> again:	
> 
> //task's rq may already changed in "sched_move_task"
> 
>         p_rq = task_rq(p);
>         double_rq_lock(rq, p_rq);
>         while (task_rq(p) != p_rq) {
>                 double_rq_unlock(rq, p_rq);
>                 goto again;
>         }
> 
> I think it may happen in this scene:
> 
> 	cpu 0				cpu 1(task a)
> 
> 					yield_to {
> 					disable_irq
> 	sched_move_task {		rq = this_rq();
> 	task_rq_lock(task a)		double_rq_lock
> 
> 	hold lock of rq 1			
> 	set_task_rq			//task rq changed
> 	release lock of rq 1
> 
> 					hold lock of rq 1
> 					but task b no longer
> 					there
> 
> 					set rq 1's current to
> 					skip which is not task a
> 
> which means we hold a rq's lock but it's current is not the one should
> do yield.
> 
> Only "sched_move_task" will cause this issue as it will move the task
> which is still running.
> 
> The bug will make the task who want to do yield failed to set skip buddy
> to himself, but to a innocent task instead, not very harmful and almost
> impossible to occur in normal, but should we fix it with another check
> "rq == this_rq()"?

Uhm, what?!

We've got interrupts disabled, this_rq() cannot ever possibly change, so
rq is always correct.

Only p_rq can change, and we have an again loop on that, so what's the
problem again?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/