linux-kernel - Re: [RFC PATCH v1 00/11] Create fast idle path for short idle periods

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20170719165425.GE3730@linux.vnet.ibm.com>
Date:   Wed, 19 Jul 2017 09:54:25 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Christopher Lameter <cl@...ux.com>
Cc:     "Li, Aubrey" <aubrey.li@...ux.intel.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Andi Kleen <ak@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Frederic Weisbecker <fweisbec@...il.com>,
        Aubrey Li <aubrey.li@...el.com>, len.brown@...el.com,
        rjw@...ysocki.net, tim.c.chen@...ux.intel.com,
        arjan@...ux.intel.com, yang.zhang.wz@...il.com, x86@...nel.org,
        linux-kernel@...r.kernel.org, daniel.lezcano@...aro.org
Subject: Re: [RFC PATCH v1 00/11] Create fast idle path for short idle periods

On Wed, Jul 19, 2017 at 10:03:22AM -0500, Christopher Lameter wrote:
> On Wed, 19 Jul 2017, Paul E. McKenney wrote:
> 
> > > Do we have any problem if we skip RCU idle enter/exit under a fast idle scenario?
> > > My understanding is, if tick is not stopped, then we don't need inform RCU in
> > > idle path, it can be informed in irq exit.
> >
> > Indeed, the problem arises when the tick is stopped.
> 
> Well is there a boundary when you would want the notification calls? I
> would think that even an idle period of a couple of seconds does not
> necessarily require a callback to rcu. Had some brokenness here where RCU
> calls did not occur for hours or so. At some point the system ran out of
> memory but thats far off.

Yeah, I should spell this out more completely, shouldn't I?  And get
it into the documentation if it isn't already there...  Where it is
currently at best hinted at.  :-/

1.	If a CPU is either idle or executing in usermode, and RCU believes
	it is non-idle, the scheduling-clock tick had better be running.
	Otherwise, you will get RCU CPU stall warnings.  Or at best,
	very long (11-second) grace periods, with a pointless IPI waking
	the CPU each time.

2.	If a CPU is in a portion of the kernel that executes RCU read-side
	critical sections, and RCU believes this CPU to be idle, you can get
	random memory corruption.  DON'T DO THIS!!!

	This is one reason to test with lockdep, which will complain
	about this sort of thing.

3.	If a CPU is in a portion of the kernel that is absolutely
	positively no-joking guaranteed to never execute any RCU read-side
	critical sections, and RCU believes this CPU to to be idle,
	no problem.  This sort of thing is used by some architectures
	for light-weight exception handlers, which can then avoid the
	overhead of rcu_irq_enter() and rcu_irq_exit().  Some go further
	and avoid the entireties of irq_enter() and irq_exit().

	Just make very sure you are running some of your tests with
	CONFIG_PROVE_RCU=y.

4.	If a CPU is executing in the kernel with the scheduling-clock
	interrupt disabled and RCU believes this CPU to be non-idle,
	and if the CPU goes idle (from an RCU perspective) every few
	jiffies, no problem.  It is usually OK for there to be the
	occasional gap between idle periods of up to a second or so.

	If the gap grows too long, you get RCU CPU stall warnings.

5.	If a CPU is either idle or executing in usermode, and RCU believes
	it to be idle, of course no problem.

6.	If a CPU is executing in the kernel, the kernel code
	path is passing through quiescent states at a reasonable
	frequency (preferably about once per few jiffies, but the
	occasional excursion to a second or so is usually OK) and the
	scheduling-clock interrupt is enabled, of course no problem.

	If the gap between a successive pair of quiescent states grows
	too long, you get RCU CPU stall warnings.

For purposes of comparison, the default time until RCU CPU stall warning
in mainline is 21 seconds.  A number of distros set this to 60 seconds.
Back in the 90s, the analogous timeout for DYNIX/ptx was 1.5 seconds.  :-/

Hence "a second or so" instead of your "a couple of seconds".  ;-)

Please see below for the corresponding patch to RCU's requirements.
Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

commit 693c9bfa43f92570dd362d8834440b418bbb994a
Author: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
Date:   Wed Jul 19 09:52:58 2017 -0700

    doc: Set down RCU's scheduling-clock-interrupt needs
    
    This commit documents the situations in which RCU needs the
    scheduling-clock interrupt to be enabled, along with the consequences
    of failing to meet RCU's needs in this area.
    
    Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>

diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html
index 95b30fa25d56..7980bee5607f 100644
--- a/Documentation/RCU/Design/Requirements/Requirements.html
+++ b/Documentation/RCU/Design/Requirements/Requirements.html
@@ -2080,6 +2080,8 @@ Some of the relevant points of interest are as follows:
 <li>	<a href="#Scheduler and RCU">Scheduler and RCU</a>.
 <li>	<a href="#Tracing and RCU">Tracing and RCU</a>.
 <li>	<a href="#Energy Efficiency">Energy Efficiency</a>.
+<li>	<a href="#Scheduling-Clock Interrupts and RCU">
+	Scheduling-Clock Interrupts and RCU</a>.
 <li>	<a href="#Memory Efficiency">Memory Efficiency</a>.
 <li>	<a href="#Performance, Scalability, Response Time, and Reliability">
 	Performance, Scalability, Response Time, and Reliability</a>.
@@ -2532,6 +2534,113 @@ I learned of many of these requirements via angry phone calls:
 Flaming me on the Linux-kernel mailing list was apparently not
 sufficient to fully vent their ire at RCU's energy-efficiency bugs!
 
+<h3><a name="Scheduling-Clock Interrupts and RCU">
+Scheduling-Clock Interrupts and RCU</a></h3>
+
+<p>
+The kernel transitions between in-kernel non-idle execution, userspace
+execution, and the idle loop.
+Depending on kernel configuration, RCU handles these states differently:
+
+<table border=3>
+<tr><th><tt>HZ</tt> Kconfig</th>
+	<th>In-Kernel</th>
+		<th>Usermode</th>
+			<th>Idle</th></tr>
+<tr><th align="left"><tt>HZ_PERIODIC</tt></th>
+	<td>Can rely on scheduling-clock interrupt.</td>
+		<td>Can rely on scheduling-clock interrupt and its
+		    detection of interrupt from usermode.</td>
+			<td>Can rely on RCU's dyntick-idle detection.</td></tr>
+<tr><th align="left"><tt>NO_HZ_IDLE</tt></th>
+	<td>Can rely on scheduling-clock interrupt.</td>
+		<td>Can rely on scheduling-clock interrupt and its
+		    detection of interrupt from usermode.</td>
+			<td>Can rely on RCU's dyntick-idle detection.</td></tr>
+<tr><th align="left"><tt>NO_HZ_FULL</tt></th>
+	<td>Can only sometimes rely on scheduling-clock interrupt.
+	    In other cases, it is necessary to bound kernel execution
+	    times and/or use IPIs.</td>
+		<td>Can rely on RCU's dyntick-idle detection.</td>
+			<td>Can rely on RCU's dyntick-idle detection.</td></tr>
+</table>
+
+<table>
+<tr><th>&nbsp;</th></tr>
+<tr><th align="left">Quick Quiz:</th></tr>
+<tr><td>
+	Why can't <tt>NO_HZ_FULL</tt> in-kernel execution rely on the
+	scheduling-clock interrupt, just like <tt>HZ_PERIODIC</tt>
+	and <tt>NO_HZ_IDLE</tt> do?
+</td></tr>
+<tr><th align="left">Answer:</th></tr>
+<tr><td bgcolor="#ffffff"><font color="ffffff">
+	Because, as a performance optimization, <tt>NO_HZ_FULL</tt>
+	does not necessarily re-enable the scheduling-clock interrupt
+	on entry to each and every system call.
+</font></td></tr>
+<tr><td>&nbsp;</td></tr>
+</table>
+
+<p>
+However, RCU must be reliably informed as to whether any given
+CPU is currently in the idle loop, and, for <tt>NO_HZ_FULL</tt>,
+also whether that CPU is executing in usermode, as discussed
+<a href="#Energy Efficiency">earlier</a>.
+It also requires that the scheduling-clock interrupt be enabled when
+RCU needs it to be:
+
+<ol>
+<li>	If a CPU is either idle or executing in usermode, and RCU believes
+	it is non-idle, the scheduling-clock tick had better be running.
+	Otherwise, you will get RCU CPU stall warnings.  Or at best,
+	very long (11-second) grace periods, with a pointless IPI waking
+	the CPU from time to time.
+<li>	If a CPU is in a portion of the kernel that executes RCU read-side
+	critical sections, and RCU believes this CPU to be idle, you will get
+	random memory corruption.  <b>DON'T DO THIS!!!</b>
+
+	<br>This is one reason to test with lockdep, which will complain
+	about this sort of thing.
+<li>	If a CPU is in a portion of the kernel that is absolutely
+	positively no-joking guaranteed to never execute any RCU read-side
+	critical sections, and RCU believes this CPU to to be idle,
+	no problem.  This sort of thing is used by some architectures
+	for light-weight exception handlers, which can then avoid the
+	overhead of <tt>rcu_irq_enter()</tt> and <tt>rcu_irq_exit()</tt>
+	at exception entry and exit, respectively.
+	Some go further and avoid the entireties of <tt>irq_enter()</tt>
+	and <tt>irq_exit()</tt>.
+
+	<br>Just make very sure you are running some of your tests with
+	<tt>CONFIG_PROVE_RCU=y</tt>, just in case one of your code paths
+	was in fact joking about not doing RCU read-side critical sections.
+<li>	If a CPU is executing in the kernel with the scheduling-clock
+	interrupt disabled and RCU believes this CPU to be non-idle,
+	and if the CPU goes idle (from an RCU perspective) every few
+	jiffies, no problem.  It is usually OK for there to be the
+	occasional gap between idle periods of up to a second or so.
+
+	<br>If the gap grows too long, you get RCU CPU stall warnings.
+<li>	If a CPU is either idle or executing in usermode, and RCU believes
+	it to be idle, of course no problem.
+<li>	If a CPU is executing in the kernel, the kernel code
+	path is passing through quiescent states at a reasonable
+	frequency (preferably about once per few jiffies, but the
+	occasional excursion to a second or so is usually OK) and the
+	scheduling-clock interrupt is enabled, of course no problem.
+
+	<br>If the gap between a successive pair of quiescent states grows
+	too long, you get RCU CPU stall warnings.
+</ol>
+
+<p>
+But as long as RCU is properly informed of kernel state transitions between
+in-kernel execution, usermode execution, and idle, and as long as the
+scheduling-clock interrupt is enabled when RCU needs it to be, you
+can rest assured that the bugs you encounter will be in some other
+part of RCU or some other part of the kernel!
+
 <h3><a name="Memory Efficiency">Memory Efficiency</a></h3>
 
 <p>