lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 03 Feb 2011 18:16:51 +0100
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	frank.rowand@...sony.com
Cc:	Chris Mason <chris.mason@...cle.com>, Ingo Molnar <mingo@...e.hu>,
	Thomas Gleixner <tglx@...utronix.de>,
	Mike Galbraith <efault@....de>,
	Oleg Nesterov <oleg@...hat.com>, Paul Turner <pjt@...gle.com>,
	Jens Axboe <axboe@...nel.dk>,
	Yong Zhang <yong.zhang0@...il.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC][PATCH 14/18] sched: Remove rq->lock from the first half
 of ttwu()

On Fri, 2011-01-28 at 17:05 -0800, Frank Rowand wrote:
> 
> The deadlock can occur if __ARCH_WANT_UNLOCKED_CTXSW and
> __ARCH_WANT_INTERRUPTS_ON_CTXSW are defined.
> 
> A task sets p->state = TASK_UNINTERRUPTIBLE, then calls schedule().
> 
> schedule()
>    prev->on_rq = 0
>    context_switch()
>       prepare_task_switch()
>          prepare_lock_switch()
>             raw_spin_unlock_irq(&rq->lock)
> 
> At this point, a pending interrupt (on this same cpu) is handled.
> The interrupt handling results in a call to try_to_wake_up() on the
> current process.  The try_to_wake_up() gets into:
> 
>    while (p->on_cpu)
>       cpu_relax();
> 
> and spins forever.  This is because "prev->on_cpu = 0" slightly
> after this point at:
> 
>    finish_task_switch()
>       finish_lock_switch()
>          prev->on_cpu = 0

Right, very good spot!

> 
> One possible fix would be to get rid of __ARCH_WANT_INTERRUPTS_ON_CTXSW.
> I don't suspect the reaction to that suggestion will be very positive...

:-), afaik some architectures requires this, ie. removing this would
require dropping whole architectures.

> Another fix might be:
> 
>    while (p->on_cpu) {
>       if (p == current)
>          goto out_activate;
>       cpu_relax();
>       }
> 
>    Then add back in the out_activate label.
> 
> I don't know if the second fix is good -- I haven't thought out how
> it impacts the later patches in the series.

Right, I've done something similar to this, simply short-circuit the cpu
selection to force it to activate the task on the local cpu.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ