lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 1 Nov 2019 04:54:54 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Lai Jiangshan <laijs@...ux.alibaba.com>
Cc:     linux-kernel@...r.kernel.org,
        Josh Triplett <josh@...htriplett.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Joel Fernandes <joel@...lfernandes.org>, rcu@...r.kernel.org
Subject: Re: [PATCH 05/11] rcu: clean all rcu_read_unlock_special after
 report qs

On Thu, Oct 31, 2019 at 10:08:00AM +0000, Lai Jiangshan wrote:
> rcu_preempt_deferred_qs_irqrestore() is always called when
> ->rcu_read_lock_nesting <= 0 and there is nothing to prevent it
> from reporting qs if needed which means ->rcu_read_unlock_special
> is better to be clearred after the function. But in some cases,
> it leaves exp_hint (for example, after rcu_note_context_switch()),
> which is harmess since it is just a hint, but it may also intruduce
> unneeded rcu_read_unlock_special() later.
> 
> The new code write to exp_hint without WRITE_ONCE() since the
> writing is protected by irq.
> 
> Signed-off-by: Lai Jiangshan <laijs@...ux.alibaba.com>

This does look good, thank you!

Though for a rather different reason that called out in the commit log.
Note that .exp_hint is in task_struct, and thus follows the task, and is
currently unconditionally cleared by the next rcu_read_unlock_special(),
which will be invoked because .exp_hint is non-zero.  So if
rcu_note_context_switch() leaves .exp_hint set, that is all to the good
because the next rcu_read_unlock() executed by this task once resumed,
and because that rcu_read_unlock() might be transitioning to a quiescent
state required in order for the expedited grace period to end.

Nevertheless, clearing .exp_hint in rcu_preempt_deferred_qs_irqrestore()
instead of rcu_read_unlock_special() is a good thing.  This is because
in the case where it is not possible to arrange an event immediately
following reenabling, it is good to enable any rcu_read_unlock()s that
might be executed to help us out.

Or am I missing something here?

> ---
>  kernel/rcu/tree_plugin.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 9fe8138ed3c3..bb3bcdb5c9b8 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -439,6 +439,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
>  	 * t->rcu_read_unlock_special cannot change.
>  	 */
>  	special = t->rcu_read_unlock_special;
> +	t->rcu_read_unlock_special.b.exp_hint = false;

Interrupts are disabled by this time, so yes, it is safe for this to be
a simple assignment.

>  	t->rcu_read_unlock_special.b.deferred_qs = false;
>  	if (special.b.need_qs) {
>  		rcu_qs();
> @@ -626,7 +627,6 @@ static void rcu_read_unlock_special(struct task_struct *t)
>  		local_irq_restore(flags);
>  		return;
>  	}
> -	WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);

And reaching the above assignment is inevitable at this point,
so this assignment can go.

But can't we also get rid of the assignment under the preceding "if"
statement?  Please see below for a version of your patch that includes
this proposed change along with an updated commit log.  Please let me
know if I have messed anything up.

>  	rcu_preempt_deferred_qs_irqrestore(t, flags);
>  }

------------------------------------------------------------------------

commit 6e3f2b1d6aba24ad6ae8deb2ce98b93d527227ae
Author: Lai Jiangshan <laijs@...ux.alibaba.com>
Date:   Fri Nov 1 04:06:22 2019 -0700

    rcu: Clear .exp_hint only when deferred quiescent state has been reported
    
    Currently, the .exp_hint flag is cleared in rcu_read_unlock_special(),
    which works, but which can also prevent subsequent rcu_read_unlock() calls
    from helping expedite the quiescent state needed by an ongoing expedited
    RCU grace period.  This commit therefore defers clearing of .exp_hint
    from rcu_read_unlock_special() to rcu_preempt_deferred_qs_irqrestore(),
    thus ensuring that intervening calls to rcu_read_unlock() have a chance
    to help end the expedited grace period.
    
    Signed-off-by: Lai Jiangshan <laijs@...ux.alibaba.com>
    Signed-off-by: Paul E. McKenney <paulmck@...nel.org>

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index d4c4824..677ee1c 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -439,6 +439,7 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
 	 * t->rcu_read_unlock_special cannot change.
 	 */
 	special = t->rcu_read_unlock_special;
+	t->rcu_read_unlock_special.b.exp_hint = false;
 	rdp = this_cpu_ptr(&rcu_data);
 	if (!special.s && !rdp->exp_deferred_qs) {
 		local_irq_restore(flags);
@@ -610,7 +611,6 @@ static void rcu_read_unlock_special(struct task_struct *t)
 		struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
 		struct rcu_node *rnp = rdp->mynode;
 
-		t->rcu_read_unlock_special.b.exp_hint = false;
 		exp = (t->rcu_blocked_node && t->rcu_blocked_node->exp_tasks) ||
 		      (rdp->grpmask & rnp->expmask) ||
 		      tick_nohz_full_cpu(rdp->cpu);
@@ -640,7 +640,6 @@ static void rcu_read_unlock_special(struct task_struct *t)
 		local_irq_restore(flags);
 		return;
 	}
-	WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
 	rcu_preempt_deferred_qs_irqrestore(t, flags);
 }
 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ