linux-kernel - Re: [PATCHv2 1/2] rcu/tree: handle VM stoppage in stall detection

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210524034645.GH4441@paulmck-ThinkPad-P17-Gen-1>
Date:   Sun, 23 May 2021 20:46:45 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Sergey Senozhatsky <senozhatsky@...omium.org>
Cc:     Josh Triplett <josh@...htriplett.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Joel Fernandes <joel@...lfernandes.org>, rcu@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCHv2 1/2] rcu/tree: handle VM stoppage in stall detection

On Mon, May 24, 2021 at 10:56:44AM +0900, Sergey Senozhatsky wrote:
> On (21/05/21 14:38), Paul E. McKenney wrote:
> > 
> > And on that otherwise inexplicable refetch of the jiffies counter within
> > check_cpu_stall(), the commit below makes it more effective.
> > 
> > If check_cpu_stall() is delayed before or while printing the stall
> > warning, we really want to wait the full time duration between the
> > end of that stall warning and the start of the next one.
> >
> 
> Nice improvement!

Thank you, glad you like it!

> > Of course, if there is some way to learn whether printk() is overloaded,
> > even more effective approaches could be taken.
> 
> There is no better to do this.

I was afraid of that.  ;-)

> > commit b9c5dc2856c1538ccf2d09246df2b58bede72cca
> > Author: Paul E. McKenney <paulmck@...nel.org>
> > Date:   Fri May 21 14:23:03 2021 -0700
> > 
> >     rcu: Start timing stall repetitions after warning complete
> >     
> >     Systems with low-bandwidth consoles can have very large printk()
> >     latencies, and on such systems it makes no sense to have the next RCU CPU
> >     stall warning message start output before the prior message completed.
> >     This commit therefore sets the time of the next stall only after the
> >     prints have completed.  While printing, the time of the next stall
> >     message is set to ULONG_MAX/2 jiffies into the future.
> >     
> >     Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
> 
> FWIW,
> 
> Reviewed-by: Sergey Senozhatsky <senozhatsky@...omium.org>

Thank you again, I will apply this on my next rebase.

> > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> > index 05012a8081a1..ff239189a627 100644
> > --- a/kernel/rcu/tree_stall.h
> > +++ b/kernel/rcu/tree_stall.h
> > @@ -648,6 +648,7 @@ static void print_cpu_stall(unsigned long gps)
> >  
> >  static void check_cpu_stall(struct rcu_data *rdp)
> >  {
> > +	bool didstall = false;
> >  	unsigned long gs1;
> >  	unsigned long gs2;
> >  	unsigned long gps;
> > @@ -693,7 +694,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
> >  	    ULONG_CMP_GE(gps, js))
> >  		return; /* No stall or GP completed since entering function. */
> >  	rnp = rdp->mynode;
> > -	jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
> > +	jn = jiffies + ULONG_MAX / 2;
> >  	if (rcu_gp_in_progress() &&
> >  	    (READ_ONCE(rnp->qsmask) & rdp->grpmask) &&
> >  	    cmpxchg(&rcu_state.jiffies_stall, js, jn) == js) {
> > @@ -710,6 +711,7 @@ static void check_cpu_stall(struct rcu_data *rdp)
> >  		print_cpu_stall(gps);
> >  		if (READ_ONCE(rcu_cpu_stall_ftrace_dump))
> >  			rcu_ftrace_dump(DUMP_ALL);
> > +		didstall = true;
> >  
> >  	} else if (rcu_gp_in_progress() &&
> >  		   ULONG_CMP_GE(j, js + RCU_STALL_RAT_DELAY) &&
> > @@ -727,6 +729,11 @@ static void check_cpu_stall(struct rcu_data *rdp)
> >  		print_other_cpu_stall(gs2, gps);
> >  		if (READ_ONCE(rcu_cpu_stall_ftrace_dump))
> >  			rcu_ftrace_dump(DUMP_ALL);
> > +		didstall = true;
> > +	}
> > +	if (didstall && READ_ONCE(rcu_state.jiffies_stall) == jn) {
> 
> Can `rcu_state.jiffies_stall` change here?

In theory, yes, sort of, anyway.  In practice, highly unlikely.
The most plausible way for this to happen is for this code path to be
delayed for a long time on a 32-bit system, so that jiffies+ULONG_MAX/2
actually arrives.  But in that case, all sorts of other complaints
would happen first.

But I could make this a cmpxchg(), if that is what you are getting at.

							Thanx, Paul

> > +		jn = jiffies + 3 * rcu_jiffies_till_stall_check() + 3;
> > +		WRITE_ONCE(rcu_state.jiffies_stall, jn);
> >  	}
> >  }