lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210521001443.GR4441@paulmck-ThinkPad-P17-Gen-1>
Date:   Thu, 20 May 2021 17:14:43 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Sergey Senozhatsky <senozhatsky@...omium.org>
Cc:     Josh Triplett <josh@...htriplett.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Joel Fernandes <joel@...lfernandes.org>,
        Suleiman Souhlal <suleiman@...gle.com>, rcu@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] rcu/tree: consider time a VM was suspended

On Fri, May 21, 2021 at 07:24:03AM +0900, Sergey Senozhatsky wrote:
> On (21/05/20 07:53), Paul E. McKenney wrote:
> > On Thu, May 20, 2021 at 03:18:07PM +0900, Sergey Senozhatsky wrote:
> > > On (21/05/18 16:15), Paul E. McKenney wrote:
> > > > 
> > > > In the shorter term...  PVCLOCK_GUEST_STOPPED is mostly for things like
> > > > guest migration and debugger breakpoints, correct?  Either way, I am
> > > > wondering if rcu_cpu_stall_reset() should take a lighter touch.  Right
> > > > now, it effectively disables all stalls for the current grace period.
> > > > Why not make it restart the stall timeout when the stoppage is detected?
> > > 
> > > rcu_cpu_stall_reset() is used in many other places, not sure if we can
> > > change its behaviour rcu_cpu_stall_reset().
> > 
> > There was some use case back in the day where they wanted an indefinite
> > suppression of RCU CPU stall warnings for the current grace period, but
> > all the current use cases look fine with restarting the stall timeout.
> > 
> > However, please see below.
> > 
> > > Maybe it'll be possible to just stop calling it from PV-clock and do
> > > something like this
> > 
> > This was in fact one of the things I was considering, at least until
> > I convinced myself that I needed to ask some questions.
> > 
> > One point of possibly unnecessary nervousness on my part is resetting
> > the start of the grace period, which might confuse diagnostics.
> > 
> > But how about something like this?
> > 
> > void rcu_cpu_stall_reset(void)
> > {
> > 	WRITE_ONCE(rcu_state.jiffies_stall,
> > 		   jiffies + rcu_jiffies_till_stall_check());
> > }
> > 
> > Would something like that work?
> 
> This should work.
> 
> On a side note.
> 
> I wish we didn't have to put kvm_check_and_clear_guest_paused() all
> over the place.
> 
> We do load jiffies at the start of check_cpu_stall(). So, in theory,
> we can just use that captured jiffies (which can become obsolete but
> so will grace period timestamps) value and never read current system
> jiffies because they can jump way forward. IOW
> 
> 	jn = j + 3 * rcu_jiffies_till_stall_check() + 3;
> 
> instead of
> 
> 	jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
> 
> Then we probably can remove kvm_check_and_clear_guest_paused().
> 
> But that "don't load current jiffies" is rather fragile.
> 
> kvm_check_and_clear_guest_paused() is not pretty, but at least it's
> explicit.

If this works for you, I am very much in favor!

							Thanx, Paul

> ---
> 
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index 49dda86a0e84..24f749bc1f90 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -695,19 +695,11 @@ static void check_cpu_stall(struct rcu_data *rdp)
>  	    ULONG_CMP_GE(gps, js))
>  		return; /* No stall or GP completed since entering function. */
>  	rnp = rdp->mynode;
> -	jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
> +	jn = j + 3 * rcu_jiffies_till_stall_check() + 3;
>  	if (rcu_gp_in_progress() &&
>  	    (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
>  	    cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
>  
> -		/*
> -		 * If a virtual machine is stopped by the host it can look to
> -		 * the watchdog like an RCU stall. Check to see if the host
> -		 * stopped the vm.
> -		 */
> -		if (kvm_check_and_clear_guest_paused())
> -			return;
> -
>  		/* We haven't checked in, so go dump stack. */
>  		print_cpu_stall(gps);
>  		if (READ_ONCE(rcu_cpu_stall_ftrace_dump))
> @@ -717,14 +709,6 @@ static void check_cpu_stall(struct rcu_data *rdp)
>  		   ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
>  		   cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
>  
> -		/*
> -		 * If a virtual machine is stopped by the host it can look to
> -		 * the watchdog like an RCU stall. Check to see if the host
> -		 * stopped the vm.
> -		 */
> -		if (kvm_check_and_clear_guest_paused())
> -			return;
> -
>  		/* They had a few time units to dump stack, so complain. */
>  		print_other_cpu_stall(gs2, gps);
>  		if (READ_ONCE(rcu_cpu_stall_ftrace_dump))

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ