lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20180514210441.GL26088@linux.vnet.ibm.com>
Date:   Mon, 14 May 2018 14:04:42 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Byungchul Park <byungchul.park@....com>
Cc:     Joel Fernandes <joel@...lfernandes.org>, jiangshanlai@...il.com,
        josh@...htriplett.org, rostedt@...dmis.org,
        mathieu.desnoyers@...icios.com, linux-kernel@...r.kernel.org,
        kernel-team@....com, peterz@...radead.org
Subject: Re: [PATCH] rcu: Report a quiescent state when it's exactly in the
 state

On Mon, May 14, 2018 at 11:59:41AM +0900, Byungchul Park wrote:
> On 2018-05-12 오전 7:41, Joel Fernandes wrote:
> >On Fri, May 11, 2018 at 09:17:46AM -0700, Paul E. McKenney wrote:
> >>On Fri, May 11, 2018 at 09:57:54PM +0900, Byungchul Park wrote:
> >>>Hello folks,
> >>>
> >>>I think I wrote the title in a misleading way.
> >>>
> >>>Please change the title to something else such as,
> >>>"rcu: Report a quiescent state when it's in the state" or,
> >>>"rcu: Add points reporting quiescent states where proper" or so on.
> >>>
> >>>On 2018-05-11 오후 5:30, Byungchul Park wrote:
> >>>>We expect a quiescent state of TASKS_RCU when cond_resched_tasks_rcu_qs()
> >>>>is called, no matter whether it actually be scheduled or not. However,
> >>>>it currently doesn't report the quiescent state when the task enters
> >>>>into __schedule() as it's called with preempt = true. So make it report
> >>>>the quiescent state unconditionally when cond_resched_tasks_rcu_qs() is
> >>>>called.
> >>>>
> >>>>And in TINY_RCU, even though the quiescent state of rcu_bh also should
> >>>>be reported when the tick interrupt comes from user, it doesn't. So make
> >>>>it reported.
> >>>>
> >>>>Lastly in TREE_RCU, rcu_note_voluntary_context_switch() should be
> >>>>reported when the tick interrupt comes from not only user but also idle,
> >>>>as an extended quiescent state.
> >>>>
> >>>>Signed-off-by: Byungchul Park <byungchul.park@....com>
> >>>>---
> >>>>  include/linux/rcupdate.h | 4 ++--
> >>>>  kernel/rcu/tiny.c        | 6 +++---
> >>>>  kernel/rcu/tree.c        | 4 ++--
> >>>>  3 files changed, 7 insertions(+), 7 deletions(-)
> >>>>
> >>>>diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> >>>>index ee8cf5fc..7432261 100644
> >>>>--- a/include/linux/rcupdate.h
> >>>>+++ b/include/linux/rcupdate.h
> >>>>@@ -195,8 +195,8 @@ static inline void exit_tasks_rcu_finish(void) { }
> >>>>   */
> >>>>  #define cond_resched_tasks_rcu_qs() \
> >>>>  do { \
> >>>>-	if (!cond_resched()) \
> >>>>-		rcu_note_voluntary_context_switch_lite(current); \
> >>>>+	rcu_note_voluntary_context_switch_lite(current); \
> >>>>+	cond_resched(); \
> >>
> >>Ah, good point.
> >>
> >>Peter, I have to ask...  Why is "cond_resched()" considered a preemption
> >>while "schedule()" is not?
> >
> >Infact something interesting I inferred from the __schedule loop related to
> >your question:
> >
> >switch_count can either be set to prev->invcsw or prev->nvcsw. If we can
> >assume that switch_count reflects whether the context switch is involuntary
> >or voluntary,
> >task-running-state	preempt		switch_count
> >0 (running)		1		involuntary
> >0			0		involuntary
> >1			0		voluntary
> >1			1		involuntary
> >
> >According to the above table, both the task's running state and the preempt
> >parameter to __schedule should be used together to determine if the switch is
> >a voluntary one or not.
> >
> >So this code in rcu_note_context_switch should really be:
> >if (!preempt && !(current->state & TASK_RUNNING))
> >	rcu_note_voluntary_context_switch_lite(current);
> >
> >According to the above table, cond_resched always classifies as an
> >involuntary switch which makes sense to me. Even though cond_resched is
> 
> Hello guys,
> 
> The classification for nivcsw/nvcsw used in scheduler core, Joel, you
> showed us is different from that used in when we distinguish between
> non preemption/voluntary preemption/preemption/full and so on, even
> they use the same word, "voluntary" though.
> 
> The name, rcu_note_voluntary_context_switch_lite() used in RCU has
> a lot to do with the latter, the term of preemption. Furthermore, I
> think the function should be called even when calling schedule() for
> sleep as well. I think it would be better to change the function
> name to something else to prevent confusing, it's up to Paul tho. :)

Given what it currently does, the name should be rcu_tasks_qs() to go
along with rcu_bh_qs(), rcu_preempt_qs(), and rcu_sched_qs().  Much as
I would like cond_resched() to be an RCU-tasks quiescent state, it is
nothingness for PREEMPT=y kernels, and Peter has indicated a strong
interest in having it remain so.  But I did update a few comments.

I left rcu_note_voluntary_context_switch() alone because it should be
disappearing entirely Real Soon Now.

Please see patch below.

							Thanx, Paul

PS.  Oddly enough, the recent patch removing the "if" from
     cond_resched_tasks_rcu_qs() is (technically speaking) pointless.
     If the kernel contains RCU-tasks, it must be preemptible, which
     means that cond_resched() unconditionally returns false, which
     in turn means that rcu_note_voluntary_context_switch_lite() was
     unconditionally invoked.

     Simiarly, in non-preemptible kernels, where cond_resched()
     might well return true, rcu_note_voluntary_context_switch_lite()
     is a no-op.

     So that patch had no effect, but I am keeping it due to the
     improved readability.  I should probably update its commit log,
     though.  ;-)

------------------------------------------------------------------------

commit 5b39fc0d9bc6c806cb42ed546c37655689b4dbdd
Author: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
Date:   Mon May 14 13:52:27 2018 -0700

    rcu: Improve RCU-tasks naming and comments
    
    The naming and comments associated with some RCU-tasks code make
    the faulty assumption that context switches due to cond_resched()
    are voluntary.  As several people pointed out, this is not the case.
    This commit therefore updates function names and comments to better
    reflect current reality.
    
    Reported-by: Byungchul Park <byungchul.park@....com>
    Reported-by: Joel Fernandes <joel@...lfernandes.org>
    Reported-by: Steven Rostedt <rostedt@...dmis.org>
    Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 028d07ce198a..713b93af26f3 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -159,11 +159,11 @@ static inline void rcu_init_nohz(void) { }
 	} while (0)
 
 /*
- * Note a voluntary context switch for RCU-tasks benefit.  This is a
- * macro rather than an inline function to avoid #include hell.
+ * Note a quasi-voluntary context switch for RCU-tasks's benefit.
+ * This is a macro rather than an inline function to avoid #include hell.
  */
 #ifdef CONFIG_TASKS_RCU
-#define rcu_note_voluntary_context_switch_lite(t) \
+#define rcu_tasks_qs(t) \
 	do { \
 		if (READ_ONCE((t)->rcu_tasks_holdout)) \
 			WRITE_ONCE((t)->rcu_tasks_holdout, false); \
@@ -171,14 +171,14 @@ static inline void rcu_init_nohz(void) { }
 #define rcu_note_voluntary_context_switch(t) \
 	do { \
 		rcu_all_qs(); \
-		rcu_note_voluntary_context_switch_lite(t); \
+		rcu_tasks_qs(t); \
 	} while (0)
 void call_rcu_tasks(struct rcu_head *head, rcu_callback_t func);
 void synchronize_rcu_tasks(void);
 void exit_tasks_rcu_start(void);
 void exit_tasks_rcu_finish(void);
 #else /* #ifdef CONFIG_TASKS_RCU */
-#define rcu_note_voluntary_context_switch_lite(t)	do { } while (0)
+#define rcu_tasks_qs(t)	do { } while (0)
 #define rcu_note_voluntary_context_switch(t)		rcu_all_qs()
 #define call_rcu_tasks call_rcu_sched
 #define synchronize_rcu_tasks synchronize_sched
@@ -195,7 +195,7 @@ static inline void exit_tasks_rcu_finish(void) { }
  */
 #define cond_resched_tasks_rcu_qs() \
 do { \
-	rcu_note_voluntary_context_switch_lite(current); \
+	rcu_tasks_qs(current); \
 	cond_resched(); \
 } while (0)
 
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index ce9beec35e34..79409dbb5478 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -93,7 +93,7 @@ static inline void kfree_call_rcu(struct rcu_head *head,
 #define rcu_note_context_switch(preempt) \
 	do { \
 		rcu_sched_qs(); \
-		rcu_note_voluntary_context_switch_lite(current); \
+		rcu_tasks_qs(current); \
 	} while (0)
 
 static inline int rcu_needs_cpu(u64 basemono, u64 *nextevt)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 3826ce90fd6e..4e96761ce367 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -456,7 +456,7 @@ void rcu_note_context_switch(bool preempt)
 		rcu_momentary_dyntick_idle();
 	this_cpu_inc(rcu_dynticks.rcu_qs_ctr);
 	if (!preempt)
-		rcu_note_voluntary_context_switch_lite(current);
+		rcu_tasks_qs(current);
 out:
 	trace_rcu_utilization(TPS("End context switch"));
 	barrier(); /* Avoid RCU read-side critical sections leaking up. */
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 4c230a60ece4..5783bdf86e5a 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -507,14 +507,15 @@ early_initcall(check_cpu_stall_init);
 #ifdef CONFIG_TASKS_RCU
 
 /*
- * Simple variant of RCU whose quiescent states are voluntary context switch,
- * user-space execution, and idle.  As such, grace periods can take one good
- * long time.  There are no read-side primitives similar to rcu_read_lock()
- * and rcu_read_unlock() because this implementation is intended to get
- * the system into a safe state for some of the manipulations involved in
- * tracing and the like.  Finally, this implementation does not support
- * high call_rcu_tasks() rates from multiple CPUs.  If this is required,
- * per-CPU callback lists will be needed.
+ * Simple variant of RCU whose quiescent states are voluntary context
+ * switch, cond_resched_rcu_qs(), user-space execution, and idle.
+ * As such, grace periods can take one good long time.  There are no
+ * read-side primitives similar to rcu_read_lock() and rcu_read_unlock()
+ * because this implementation is intended to get the system into a safe
+ * state for some of the manipulations involved in tracing and the like.
+ * Finally, this implementation does not support high call_rcu_tasks()
+ * rates from multiple CPUs.  If this is required, per-CPU callback lists
+ * will be needed.
  */
 
 /* Global list of callbacks and associated lock. */
@@ -542,11 +543,11 @@ static struct task_struct *rcu_tasks_kthread_ptr;
  * period elapses, in other words after all currently executing RCU
  * read-side critical sections have completed. call_rcu_tasks() assumes
  * that the read-side critical sections end at a voluntary context
- * switch (not a preemption!), entry into idle, or transition to usermode
- * execution.  As such, there are no read-side primitives analogous to
- * rcu_read_lock() and rcu_read_unlock() because this primitive is intended
- * to determine that all tasks have passed through a safe state, not so
- * much for data-strcuture synchronization.
+ * switch (not a preemption!), cond_resched_rcu_qs(), entry into idle,
+ * or transition to usermode execution.  As such, there are no read-side
+ * primitives analogous to rcu_read_lock() and rcu_read_unlock() because
+ * this primitive is intended to determine that all tasks have passed
+ * through a safe state, not so much for data-strcuture synchronization.
  *
  * See the description of call_rcu() for more detailed information on
  * memory ordering guarantees.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ