linux-kernel - Re: [PATCH 01/11] rcu: avoid leaking exp_deferred

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20191031134351.GO20975@paulmck-ThinkPad-P72>
Date:   Thu, 31 Oct 2019 06:43:51 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Lai Jiangshan <laijs@...ux.alibaba.com>
Cc:     linux-kernel@...r.kernel.org,
        Josh Triplett <josh@...htriplett.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Joel Fernandes <joel@...lfernandes.org>, rcu@...r.kernel.org
Subject: Re: [PATCH 01/11] rcu: avoid leaking exp_deferred_qs into next GP

On Thu, Oct 31, 2019 at 10:07:56AM +0000, Lai Jiangshan wrote:
> If exp_deferred_qs is incorrectly set and leaked to the next
> exp GP, it may cause the next GP to be incorrectly prematurely
> completed.

Could you please provide the sequence of events leading to a such a
failure?

Also, did you provoke such a failure in testing?  If so, an upgrade
to rcutorture would be good, so please tell me what you did to make
the failure happen.

I do like the reduction in state space, but I am a bit concerned about
the potential increase in contention on rnp->lock.  Thoughts?

							Thanx, Paul

> Signed-off-by: Lai Jiangshan <laijs@...ux.alibaba.com>
> ---
>  kernel/rcu/tree_exp.h | 23 ++++++++++++++---------
>  1 file changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index a0e1e51c51c2..6dec21909b30 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -603,6 +603,18 @@ static void rcu_exp_handler(void *unused)
>  	struct rcu_node *rnp = rdp->mynode;
>  	struct task_struct *t = current;
>  
> +	/*
> +	 * Note that there is a large group of race conditions that
> +	 * can have caused this quiescent state to already have been
> +	 * reported, so we really do need to check ->expmask first.
> +	 */
> +	raw_spin_lock_irqsave_rcu_node(rnp, flags);
> +	if (!(rnp->expmask & rdp->grpmask)) {
> +		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> +		return;
> +	}
> +	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> +
>  	/*
>  	 * First, the common case of not being in an RCU read-side
>  	 * critical section.  If also enabled or idle, immediately
> @@ -628,17 +640,10 @@ static void rcu_exp_handler(void *unused)
>  	 * a future context switch.  Either way, if the expedited
>  	 * grace period is still waiting on this CPU, set ->deferred_qs
>  	 * so that the eventual quiescent state will be reported.
> -	 * Note that there is a large group of race conditions that
> -	 * can have caused this quiescent state to already have been
> -	 * reported, so we really do need to check ->expmask.
>  	 */
>  	if (t->rcu_read_lock_nesting > 0) {
> -		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> -		if (rnp->expmask & rdp->grpmask) {
> -			rdp->exp_deferred_qs = true;
> -			t->rcu_read_unlock_special.b.exp_hint = true;
> -		}
> -		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> +		rdp->exp_deferred_qs = true;
> +		WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, true);
>  		return;
>  	}
>  
> -- 
> 2.20.1
>