linux-kernel - Re: [PATCH] Remove GP_REPLAY state from rcu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191004145912.GA118626@google.com>
Date:   Fri, 4 Oct 2019 10:59:12 -0400
From:   Joel Fernandes <joel@...lfernandes.org>
To:     linux-kernel@...r.kernel.org
Cc:     bristot@...hat.com, peterz@...radead.org, oleg@...hat.com,
        paulmck@...nel.org, rcu@...r.kernel.org,
        Josh Triplett <josh@...htriplett.org>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        "Paul E. McKenney" <paulmck@...ux.ibm.com>,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH] Remove GP_REPLAY state from rcu_sync

On Fri, Oct 04, 2019 at 10:57:41AM -0400, Joel Fernandes (Google) wrote:
> From: Joel Fernandes <joel@...lfernandes.org>
> 
> Please consider this is an RFC for discussion only. Just want to discuss
> why the GP_REPLAY state is needed at all.

And I messed up the subject prefix, but this is *really* RFC and for
discussion purposes :)

thanks,

 - Joel


> Here's the intention AFAICS:
> When rcu_sync_exit() has happened, the gp_state changes to GP_EXIT while
> we wait for a grace period before transitioning to GP_IDLE. In the
> meanwhile, if we receive another rcu_sync_exit(), then we want to wait
> for another GP to account for that.
> 
> Drawing some timing diagrams, it looks like this:
> 
> Legend:
> rse = rcu_sync_enter
> rsx = rcu_sync_exit
> i = GP_IDLE
> x = GP_EXIT
> r = GP_REPLAY
> e = GP_ENTER
> p = GP_PASSED
> rx = GP_REPLAY changes to GP_EXIT
> 
> GP num = The GP we are one.
> 
> note: A GP passes between the states:
>   e and p
>   x and i
>   x and rx
>   rx and i
> 
> In a simple case, the timing and states look like:
> time
> ---------------------->
> GP num         1111111    2222222
> GP state  i    e     p    x     i
> CPU0 :         rse	  rsx
> 
> However we can enter the replay state like this:
> time
> ---------------------->
> GP num         1111111    2222222222222222222223333333
> GP state  i    e     p    x              r     rx    i
> CPU0 :         rse	  rsx
> CPU1 :                         rse     rsx
> 
> Due to the second rse + rsx, we had to wait for another GP.
> 
> I believe the rationale is, if another rsx happens, another GP has to
> happen.
> 
> But this is not always true if you consider the following events:
> 
> time
> ---------------------->
> GP num         111111     22222222222222222222222222222222233333333
> GP state  i    e     p    x                 r              rx     i
> CPU0 :         rse	  rsx
> CPU1 :                         rse     rsx
> CPU2 :                                         rse     rsx
> 
> Here, we had 3 grace periods that elapsed, 1 for the rcu_sync_enter(),
> and 2 for the rcu_sync_exit(s).
> 
> However, we had 3 rcu_sync_exit()s, not 2. In other words, the
> rcu_sync_exit() got batched.
> 
> So my point here is, rcu_sync_exit() does not really always cause a new
> GP to happen and we can end up having the rcu_sync_exit()s as batched
> and sharing the same grace period.
> 
> Then what is the point of the GP_REPLAY state at all if it does not
> always wait for a new GP?  Taking a step back, why did we intend to have
> to wait for a new GP if another rcu_sync_exit() comes while one is still
> in progress?
> 
> Cc: bristot@...hat.com
> Cc: peterz@...radead.org
> Cc: oleg@...hat.com
> Cc: paulmck@...nel.org
> Cc: rcu@...r.kernel.org
> Signed-off-by: Joel Fernandes (Google) <joel@...lfernandes.org>
> ---
>  kernel/rcu/sync.c | 14 ++------------
>  1 file changed, 2 insertions(+), 12 deletions(-)
> 
> diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
> index d4558ab7a07d..4f3aad67992c 100644
> --- a/kernel/rcu/sync.c
> +++ b/kernel/rcu/sync.c
> @@ -10,7 +10,7 @@
>  #include <linux/rcu_sync.h>
>  #include <linux/sched.h>
>  
> -enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT, GP_REPLAY };
> +enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT };
>  
>  #define	rss_lock	gp_wait.lock
>  
> @@ -85,13 +85,6 @@ static void rcu_sync_func(struct rcu_head *rhp)
>  		 */
>  		WRITE_ONCE(rsp->gp_state, GP_PASSED);
>  		wake_up_locked(&rsp->gp_wait);
> -	} else if (rsp->gp_state == GP_REPLAY) {
> -		/*
> -		 * A new rcu_sync_exit() has happened; requeue the callback to
> -		 * catch a later GP.
> -		 */
> -		WRITE_ONCE(rsp->gp_state, GP_EXIT);
> -		rcu_sync_call(rsp);
>  	} else {
>  		/*
>  		 * We're at least a GP after the last rcu_sync_exit(); eveybody
> @@ -167,16 +160,13 @@ void rcu_sync_enter(struct rcu_sync *rsp)
>   */
>  void rcu_sync_exit(struct rcu_sync *rsp)
>  {
> -	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_IDLE);
> -	WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0);
> +	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) < GP_PASSED);
>  
>  	spin_lock_irq(&rsp->rss_lock);
>  	if (!--rsp->gp_count) {
>  		if (rsp->gp_state == GP_PASSED) {
>  			WRITE_ONCE(rsp->gp_state, GP_EXIT);
>  			rcu_sync_call(rsp);
> -		} else if (rsp->gp_state == GP_EXIT) {
> -			WRITE_ONCE(rsp->gp_state, GP_REPLAY);
>  		}
>  	}
>  	spin_unlock_irq(&rsp->rss_lock);
> -- 
> 2.23.0.581.g78d2f28ef7-goog
>