linux-kernel - Re: Energy-efficiency options within RCU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <X9erIC8Sbf3ybvHC@google.com>
Date:   Mon, 14 Dec 2020 13:12:48 -0500
From:   Joel Fernandes <joel@...lfernandes.org>
To:     "Paul E. McKenney" <paulmck@...nel.org>
Cc:     rcu@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Energy-efficiency options within RCU

On Thu, Dec 10, 2020 at 10:37:37AM -0800, Paul E. McKenney wrote:
> Hello, Joel,
> 
> In case you are -seriously- interested...  ;-)

I am always seriously interested :-). The issue becomes when life throws me a
curveball. This was the year of curveballs :-)

Thank you for your reply and I have added it to my list to investigate how we
are configuring nocb on our systems. I don't think anyone over here has given
these RCU issues a serious look over here.

thanks,

 - Joel



> 						Thanx, Paul
> 
> rcu_nocbs=
> 
> 	Adding a CPU to this list offloads RCU callback invocation from
> 	that CPU's softirq handler to a kthread.  In big.LITTLE systems,
> 	this kthread can be placed on a LITTLE CPU, which has been
> 	demonstrated to save significant energy in benchmarks.
> 	http://www.rdrop.com/users/paulmck/realtime/paper/AMPenergy.2013.04.19a.pdf
> 
> nohz_full=
> 
> 	Any CPU specified by this boot parameter is handled as if it was
> 	specified by rcu_nocbs=.
> 
> rcutree.jiffies_till_first_fqs=
> 
> 	Increasing this will decrease wakeup frequency to the grace-period
> 	kthread for the first FQS scan.  And increase grace-period
> 	latency.
> 
> rcutree.jiffies_till_next_fqs=
> 
> 	Ditto, but for the second and subsequent FQS scans.
> 
> 	My guess is that neither of these makes much difference.  But if
> 	they do, maybe some sort of backoff scheme for FQS scans?
> 
> rcutree.jiffies_till_sched_qs=
> 
> 	Increasing this will delay RCU's getting excited about CPUs and
> 	tasks not responding with quiescent states.  This excitement
> 	can cause extra overhead.
> 
> 	No idea whether adjusting this would help.  But if you increase
> 	rcutree.jiffies_till_first_fqs or rcutree.jiffies_till_next_fqs,
> 	you might need to increase this one accordingly.
> 
> rcutree.qovld=
> 
> 	Increasing this will increase the grace-period duration at which
> 	RCU starts sending IPIs, thus perhaps reducing the total number
> 	of IPIs that RCU sends.  The destination CPUs are unlikely to be
> 	idle, so it is not clear to me that this would help much.  But
> 	perhaps I am wrong about them being mostly non-idle, who knows?
> 
> rcupdate.rcu_cpu_stall_timeout=
> 
> 	If you get overly zealous about the earlier kernel boot parameters,
> 	you might need to increase this one as well.  Or instead use the
> 	rcupdate.rcu_cpu_stall_suppress= kernel boot parameter to suppress
> 	RCU CPU stall warnings entirely.
> 
> rcutree.rcu_nocb_gp_stride=
> 
> 	Increasing this might reduce grace-period work somewhat.  I don't
> 	see why a (say) 16-CPU system really needs to have more than one
> 	rcuog kthread, so if this does help it might be worthwhile setting
> 	a lower limit to this kernel parameter.
> 
> rcutree.rcu_idle_gp_delay=  (Only CONFIG_RCU_FAST_NO_HZ=y kernels.)
> 
> 	This defaults to four jiffies on the theory that grace periods
> 	tend to last about that long.  If grace periods tend to take
> 	longer, then it makes a lot of sense to increase this.	And maybe
> 	battery-powered devices would rather have it be about 2x or 3x
> 	the expected grace-period duration, who knows?
> 
> 	I would keep it to a power of two, but the code should work with
> 	other numbers.  Except that I don't know that this has ever been
> 	tested.  ;-)
> 
> srcutree.exp_holdoff=
> 
> 	Increasing this decreases the number of SRCU grace periods that
> 	are treated as expedited.  But you have to have closely-spaced
> 	SRCU grace periods for this to matter.	(These do happen at least
> 	sometimes because I added this only because someone complained
> 	about the performance regression from the earlier non-tree SRCU.)
> 
> rcupdate.rcu_task_ipi_delay=
> 
> 	This kernel parameter delays sending IPIs for RCU Tasks Trace,
> 	which is used by sleepable BPF programs.  Increasing it can
> 	reduce overhead, but can also increase the latency of removing
> 	sleepable BPF programs.
> 
> rcupdate.rcu_task_stall_timeout=
> 
> 	If you slow down RCU Tasks Trace too much, you may need this.
> 	But then again, the default 10-minute value should suffice.
> 
> CONFIG_RCU_FAST_NO_HZ=y
> 
> 	This only has effect on CPUs not specified by rcu_nocbs, and thus
> 	might be useful on systems that offload RCU callbacks only on
> 	some of the CPUs.  For example, a big.LITTLE system might offload
> 	only the big CPUs.  This Kconfig option reduces the frequency of
> 	timer interrupts (and thus of RCU-related softirq processing)
> 	on idle CPUs.  This has been shown to save significant energy
> 	in benchmarks:
> 	http://www.rdrop.com/users/paulmck/realtime/paper/AMPenergy.2013.04.19a.pdf
> 
> CONFIG_RCU_STRICT_GRACE_PERIOD=y
> 
> 	This works hard (as in burns CPU) to sharply reduce grace-period
> 	latency.  The effect is probably to greatly increase power
> 	consumption, but there might well be workloads where the shorter
> 	grace periods more than make up for the extra CPU time.  Or not.
> 
> CONFIG_HZ=
> 
> 	Reducing the scheduler-clock interrupt frequency has the opposite
> 	effect, namely of increasing RCU grace-period latency, but while
> 	also reducing RCU's CPU utilization.
> 
> CONFIG_TASKS_TRACE_RCU_READ_MB=y
> 
> 	Reduce the need to IPI RCU Tasks Trace holdout tasks, but at the
> 	expense of an increase in to/from idle overhead.  This Kconfig
> 	option also slows down the rate at which RCU Tasks Trace polls
> 	for holdout tasks.  This polling rate cannot be separately
> 	specified, but if changing the initial source-code values of
> 	either rcu_tasks_trace.gp_sleep or rcu_tasks_trace.init_fract
> 	proves useful, kernel boot parameters could be created.
> 
> 	That said, automatic initialization heuristics are more
> 	convenient.  When they work, anyway.