lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160830121937.GQ10138@twins.programming.kicks-ass.net>
Date:   Tue, 30 Aug 2016 14:19:37 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Balbir Singh <bsingharora@...il.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Oleg Nesterov <oleg@...hat.com>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Nicholas Piggin <nicholas.piggin@...il.com>
Subject: Re: [RFC][PATCH] Fix a race between rwsem and the scheduler

On Tue, Aug 30, 2016 at 06:49:37PM +1000, Balbir Singh wrote:
> 
> 
> The origin of the issue I've seen seems to be related to
> rwsem spin lock stealing. Basically I see the system deadlock'd in the
> following state

As Nick says (good to see you're back Nick!), this is unrelated to
rwsems.

This is true for pretty much every blocking wait loop out there, they
all do:

	for (;;) {
		current->state = UNINTERRUPTIBLE;
		smp_mb();
		if (cond)
			break;
		schedule();
	}
	current->state = RUNNING;

Which, if the wakeup is spurious, is just the pattern you need.

> +++ b/kernel/sched/core.c
> @@ -2016,6 +2016,17 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>  	success = 1; /* we're going to change ->state */
>  	cpu = task_cpu(p);
>  
> +	/*
> +	 * Ensure we see on_rq and p_state consistently
> +	 *
> +	 * For example in __rwsem_down_write_failed(), we have
> +	 *    [S] ->on_rq = 1				[L] ->state
> +	 *    MB					 RMB

There isn't an MB there. The best I can do is UNLOCK+LOCK, which, thanks
to PPC, is _not_ MB. It is however sufficient for this case.

> +	 *    [S] ->state = TASK_UNINTERRUPTIBLE	[L] ->on_rq
> +	 * In the absence of the RMB p->on_rq can be observed to be 0
> +	 * and we end up spinning indefinitely in while (p->on_cpu)
> +	 */


	/*
	 * Ensure we load p->on_rq _after_ p->state, otherwise it would
	 * be possible to, falsely, observe p->on_rq == 0 and get stuck
	 * in smp_cond_load_acquire() below.
	 *
	 * sched_ttwu_pending()			try_to_wake_up()
	 *   [S] p->on_rq = 1;			[L] P->state
	 *       UNLOCK rq->lock
	 *
	 * schedule()				RMB
	 *       LOCK rq->lock
	 *       UNLOCK rq->lock
	 *
	 * [task p]
	 *   [S] p->state = UNINTERRUPTIBLE	[L] p->on_rq
	 *
	 * Pairs with the UNLOCK+LOCK on rq->lock from the
	 * last wakeup of our task and the schedule that got our task
	 * current.
	 */

> +	smp_rmb();
>  	if (p->on_rq && ttwu_remote(p, wake_flags))
>  		goto stat;
>  


Now, this has been present for a fair while, I suspect ever since we
reworked the wakeup path to not use rq->lock twice. Curious you only now
hit it.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ