linux-kernel - Re: [LKP] Re: [rcu] ed93dfc6bc: stress-ng.icache.ops_per

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 10 Dec 2019 14:55:20 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     kernel test robot <rong.a.chen@...el.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        lkp@...ts.01.org
Subject: Re: [LKP] Re: [rcu] ed93dfc6bc: stress-ng.icache.ops_per_sec -15.0%
 regression

On Tue, Dec 10, 2019 at 10:10:30PM +0800, kernel test robot wrote:
> On Mon, Dec 09, 2019 at 04:57:08PM -0800, Paul E. McKenney wrote:
> > On Mon, Dec 09, 2019 at 05:10:11PM +0800, Rong Chen wrote:
> > > 
> > > 
> > > On 12/9/19 2:13 AM, Paul E. McKenney wrote:
> > > > On Sun, Dec 08, 2019 at 11:29:07PM +0800, kernel test robot wrote:
> > > > > Greeting,
> > > > > 
> > > > > FYI, we noticed a -15.0% regression of stress-ng.icache.ops_per_sec due to commit:
> > > > > 
> > > > > 
> > > > > commit: ed93dfc6bc0084485ccad1ff6bd2ea81ab2c03cd ("rcu: Confine ->core_needs_qs accesses to the corresponding CPU")
> > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > > > 
> > > > > in testcase: stress-ng
> > > > > on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory
> > > > > with following parameters:
> > > > > 
> > > > > 	nr_threads: 100%
> > > > > 	disk: 1HDD
> > > > > 	testtime: 1s
> > > > > 	class: cpu-cache
> > > > > 	cpufreq_governor: performance
> > > > > 	ucode: 0x500002c
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > If you fix the issue, kindly add following tag
> > > > > Reported-by: kernel test robot <rong.a.chen@...el.com>
> > > > > 
> > > > > 
> > > > > Details are as below:
> > > > > -------------------------------------------------------------------------------------------------->
> > > > > 
> > > > > 
> > > > > To reproduce:
> > > > > 
> > > > >          git clone https://github.com/intel/lkp-tests.git
> > > > >          cd lkp-tests
> > > > >          bin/lkp install job.yaml  # job file is attached in this email
> > > > >          bin/lkp run     job.yaml
> > > > > 
> > > > > =========================================================================================
> > > > > class/compiler/cpufreq_governor/disk/kconfig/nr_threads/rootfs/tbox_group/testcase/testtime/ucode:
> > > > >    cpu-cache/gcc-7/performance/1HDD/x86_64-rhel-7.6/100%/debian-x86_64-2019-11-14.cgz/lkp-csl-2sp5/stress-ng/1s/0x500002c
> > > > > 
> > > > > commit:
> > > > >    516e5ae0c9 ("rcu: Reset CPU hints when reporting a quiescent state")
> > > > >    ed93dfc6bc ("rcu: Confine ->core_needs_qs accesses to the corresponding CPU")
> > > > > 
> > > > > 516e5ae0c9401629 ed93dfc6bc0084485ccad1ff6bd
> > > > > ---------------- ---------------------------
> > > > >           %stddev     %change         %stddev
> > > > >               \          |                \
> > > > >       39049           -15.0%      33189 ± 14%  stress-ng.icache.ops_per_sec
> > > > I have to ask...
> > > > 
> > > > Given a 14% standard deviation, is this 15% change statistically
> > > > significant?
> > > > 
> > > > On the other hand, if this is due to lengthened grace periods, which
> > > > are a known side-effect of this commit, there are speedups for that
> > > > coming down the pike.
> > > 
> > > Hi Paul,
> > > 
> > > We run the test more times and stress-ng.icache.ops_per_sec is unstable:
> > > 
> > > 516e5ae0c9401629  ed93dfc6bc0084485ccad1ff6b
> > > ----------------  -------------------------- ---------------------------
> > >          %stddev      change         %stddev
> > >                    \              |                         \
> > >      37409 ±  6%        -4%      35958 ± 11%
> > > stress-ng/cpu-cache-performance-1HDD-100%-1s-ucode=0x500002c/lkp-csl-2sp5
> > > 
> > > 
> > > > 							Thanx, Paul
> > > > 
> > > > >        7784           -36.6%       4939 ±  9%  stress-ng.membarrier.ops
> > > > >        7648           -37.3%       4793 ±  9%  stress-ng.membarrier.ops_per_sec
> > > 
> > > there is still a regression of stress-ng.membarrier.ops_per_sec:
> > > 
> > > 
> > > 516e5ae0c9401629  ed93dfc6bc0084485ccad1ff6b
> > > ----------------  -------------------------- ---------------------------
> > >          %stddev      change         %stddev
> > >                    \              |                      \
> > >       7522               -37%       4744 ±  9%
> > > stress-ng/cpu-cache-performance-1HDD-100%-1s-ucode=0x500002c/lkp-csl-2sp5
> > 
> > OK, and it is much harder for me to argue that this one is statistically
> > insignificant.  From the softirq stats below, I got too aggressive about
> > removing instances of "rdp->core_needs_qs = false".  Does the (lightly
> > tested) commit below help?
> > 
> > 							Thanx, Paul
> 
> Hi Paul,
> 
> The patch can fix the regression of stress-ng.membarrier.ops_per_sec:
> 
> commit:
>   516e5ae0c9 ("rcu: Reset CPU hints when reporting a quiescent state")
>   ed93dfc6bc ("rcu: Confine ->core_needs_qs accesses to the corresponding CPU")
>   8b3a369c70 ("rcu: Clear ->core_needs_qs at GP end or self-reported QS")
> 
> 516e5ae0c9401629  ed93dfc6bc0084485ccad1ff6b  8b3a369c70986933c24efdf674  testcase/testparams/testbox
> ----------------  --------------------------  --------------------------  ---------------------------
>          %stddev      change         %stddev      change         %stddev
>              \          |                \          |                \  
>       7522             -37%       4744 ±  9%                  7392 ±  3%  stress-ng/cpu-cache-performance-1HDD-100%-1s-ucode=0x500002c/lkp-csl-2sp5
>       7522             -37%       4744                        7392        GEO-MEAN stress-ng.membarrier.ops_per_sec

Thank you for testing this!  If I am reading the numbers above correctly,
8b3a369c70 brought the performance almost all the way back to where it
was before.

If I am missing something, please let me know!

						Thanx, Paul

> Best Regards,
> Rong Chen
> 
> > 
> > ------------------------------------------------------------------------
> > 
> > commit 57412364506cf5262a7fdffa4718bf39b8891940
> > Author: Paul E. McKenney <paulmck@...nel.org>
> > Date:   Mon Dec 9 15:19:45 2019 -0800
> > 
> >     rcu: Clear ->core_needs_qs at GP end or self-reported QS
> >     
> >     The rcu_data structure's ->core_needs_qs field does not necessarily get
> >     cleared in a timely fashion after the corresponding CPUs' quiescent state
> >     has been reported.  From a functional viewpoint, no harm done, but this
> >     can result in excessive invocation of RCU core processing, as witnessed
> >     by the kernel test robot, which saw greatly increased softirq overhead.
> >     
> >     This commit therefore restores the rcu_report_qs_rdp() function's
> >     clearing of this field, but only when running on the corresponding CPU.
> >     Cases where some other CPU reports the quiescent state (for example, on
> >     behalf of an idle CPU) are handled by setting this field appropriately
> >     within the __note_gp_changes() function's end-of-grace-period checks.
> >     This handling is carried out regardless of whether the end of a grace
> >     period actually happened, thus handling the case where a CPU goes non-idle
> >     after a quiescent state is reported on its behalf, but before the grace
> >     period ends.  This fix also avoids cross-CPU updates to ->core_needs_qs,
> >     
> >     While in the area, this commit changes the __note_gp_changes() need_gp
> >     variable's name to need_qs because it is a quiescent state that is needed
> >     from the CPU in question.
> >     
> >     Fixes: ed93dfc6bc00 ("rcu: Confine ->core_needs_qs accesses to the corresponding CPU")
> >     Reported-by: kernel test robot <rong.a.chen@...el.com>
> >     Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
> > 
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index f555ea9..1d0935e 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1393,7 +1393,7 @@ static void __maybe_unused rcu_advance_cbs_nowake(struct rcu_node *rnp,
> >  static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
> >  {
> >  	bool ret = false;
> > -	bool need_gp;
> > +	bool need_qs;
> >  	const bool offloaded = IS_ENABLED(CONFIG_RCU_NOCB_CPU) &&
> >  			       rcu_segcblist_is_offloaded(&rdp->cblist);
> >  
> > @@ -1407,10 +1407,13 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
> >  	    unlikely(READ_ONCE(rdp->gpwrap))) {
> >  		if (!offloaded)
> >  			ret = rcu_advance_cbs(rnp, rdp); /* Advance CBs. */
> > +		rdp->core_needs_qs = false;
> >  		trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuend"));
> >  	} else {
> >  		if (!offloaded)
> >  			ret = rcu_accelerate_cbs(rnp, rdp); /* Recent CBs. */
> > +		if (rdp->core_needs_qs)
> > +			rdp->core_needs_qs = !!(rnp->qsmask & rdp->grpmask);
> >  	}
> >  
> >  	/* Now handle the beginnings of any new-to-this-CPU grace periods. */
> > @@ -1422,9 +1425,9 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
> >  		 * go looking for one.
> >  		 */
> >  		trace_rcu_grace_period(rcu_state.name, rnp->gp_seq, TPS("cpustart"));
> > -		need_gp = !!(rnp->qsmask & rdp->grpmask);
> > -		rdp->cpu_no_qs.b.norm = need_gp;
> > -		rdp->core_needs_qs = need_gp;
> > +		need_qs = !!(rnp->qsmask & rdp->grpmask);
> > +		rdp->cpu_no_qs.b.norm = need_qs;
> > +		rdp->core_needs_qs = need_qs;
> >  		zero_cpu_stall_ticks(rdp);
> >  	}
> >  	rdp->gp_seq = rnp->gp_seq;  /* Remember new grace-period state. */
> > @@ -2000,6 +2003,8 @@ rcu_report_qs_rdp(int cpu, struct rcu_data *rdp)
> >  		return;
> >  	}
> >  	mask = rdp->grpmask;
> > +	if (rdp->cpu == smp_processor_id())
> > +		rdp->core_needs_qs = false;
> >  	if ((rnp->qsmask & mask) == 0) {
> >  		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> >  	} else {
> > _______________________________________________
> > LKP mailing list -- lkp@...ts.01.org
> > To unsubscribe send an email to lkp-leave@...ts.01.org