lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 22 May 2017 17:00:36 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     Ingo Molnar <mingo@...nel.org>, linux-kernel@...r.kernel.org,
        Peter Zijlstra <a.p.zijlstra@...llo.nl>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: Use case for TASKS_RCU

On Fri, May 19, 2017 at 12:06:09PM -0700, Paul E. McKenney wrote:
> On Fri, May 19, 2017 at 10:23:31AM -0400, Steven Rostedt wrote:
> > On Fri, 19 May 2017 10:04:21 -0400
> > Steven Rostedt <rostedt@...dmis.org> wrote:
> > 
> > > On Fri, 19 May 2017 06:35:50 -0700
> > > "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com> wrote:
> > > 
> > > > Simpler would be better!
> > > > 
> > > > However, is it really guaranteed that one SCHED_IDLE thread cannot
> > > > preempt another?  If not, then the trampoline-freeing SCHED_IDLE thread
> > > > might preempt some other SCHED_IDLE thread in the middle of a trampoline.
> > > > I am not seeing anything that prevents such preemption, but it is rather
> > > > early local time, so I could easily be missing something.
> > > > 
> > > > However, if SCHED_IDLE threads cannot preempt other threads, even other
> > > > SCHED_IDLE threads, then your approach sounds quite promising to me.
> > > > 
> > > > Steve, Peter, thoughts?  
> > > 
> > > SCHED_IDLE is the swapper task. There's one on each CPU, and they don't
> > > migrate. And they only get called when there's no other task running.
> > 
> > Peter just "schooled" me on IRC. I stand corrected (and he may respond
> > to this email too). I guess any task can become SCHED_IDLE.
> > 
> > But that just makes this an even less likely option for
> > synchronize_rcu_tasks().
> 
> Hmmm...  The goal is to make sure that any task that was preempted or
> running at a given point in time passes through a voluntary context switch
> (or userspace execution, or, ...).
> 
> What is the simplest way to get this job done?  To Ingo's point, I bet
> that there is a simpler way than the current TASKS_RCU implementation.
> 
> Ingo, if I make it fit into 100 lines of code, would you be OK with it?
> I probably need a one-line hook at task-creation time and another
> at task-exit time, if that makes a difference.

And please see below for such a patch, which does add (just barely)
fewer than 100 lines net.

Unfortunately, it does not work, as I should have known ahead of time
from the dyntick-idle experience.  Not all context switches go through
context_switch().  :-/

I believe this is fixable, more or less like dyntick-idle's half-interrupts
were fixable, but it will likely be a few days.  Not clear whether the
result will be simpler than current TASKS_RCU, but there is only one
way to find out.  ;-)

							Thanx, Paul

------------------------------------------------------------------------

 include/linux/init_task.h |    7 +++++
 include/linux/rcupdate.h  |   12 ++++++++++
 include/linux/sched.h     |    5 ++++
 kernel/fork.c             |    4 +++
 kernel/rcu/Kconfig        |    8 +++---
 kernel/rcu/Kconfig.debug  |    2 -
 kernel/rcu/tree.c         |   15 +++++++++---
 kernel/rcu/tree_plugin.h  |   55 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/core.c       |    3 ++
 9 files changed, 101 insertions(+), 10 deletions(-)

commit 25cb9b6882542f3db0242f6556b94e139c4b29d5
Author: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
Date:   Mon May 22 08:55:01 2017 -0700

    rcu: Provide simplified RCU-tasks implementation
    
    This commit provides a simplified RCU-tasks implementation in terms of
    SRCU for CONFIG_PREEMPT=y kernels and in terms of RCU-sched otherwise.
    The RCU-sched approach simply maps synchronize_rcu_tasks() directly onto
    synchronize_sched().
    
    The SRCU approach provides a tasks_srcu srcu_struct structure that
    tracks usermode execution and voluntary context switches.  When each
    task is created, it invokes tasks_rcu_qs_exit() and when each task exits
    it invokes tasks_rcu_qs_enter().  On context switch, tasks_rcu_qs_exit()
    is invoked for the incoming task, but only if that task is resuming from a
    voluntary context switch.  If the current context switch is a preemption,
    tasks_rcu_qs_enter() is invoked for the outgoing tasks, and either way
    the preempt-or-not state is recorded for the task's next resumption.
    A "context switch" that keeps the same task invokes tasks_rcu_qs().
    
    If the scheduling-clock interrupt sees a task in usermode, it invokes
    tasks_rcu_qs().  It is normally unsafe to do SRCU read-side operations
    from interrupt, but in this case, usermode was interrupted, so there is
    no chance of having interrupted another SRCU read-side operation.
    
    Given all this, synchronize_srcu(&tasks_srcu) waits for all tasks to
    be seen voluntarily blocked or in user space.  This means that
    synchronize_rcu_tasks() is a one-line function.
    
    Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index bd11d33d48cc..631f3d3f2dc3 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -140,6 +140,13 @@ extern struct group_info init_groups;
 #else
 #define INIT_TASK_RCU_PREEMPT(tsk)
 #endif
+#ifdef CONFIG_TASKS_RCU
+#define INIT_TASK_RCU_TASKS(tsk)					\
+	.rcu_tasks_idx = 0,						\
+	.rcu_tasks_preempt = false,
+#else /* #ifdef CONFIG_TASKS_RCU */
+#define INIT_TASK_RCU_TASKS(tsk)
+#endif /* #else #ifdef CONFIG_TASKS_RCU */
 
 extern struct cred init_cred;
 
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 5088a44ce99c..1b2f104a467f 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -156,6 +156,18 @@ static inline void rcu_init_nohz(void) { }
 		rcu_irq_exit_irqson(); \
 	} while (0)
 
+#ifdef CONFIG_TASKS_RCU
+void tasks_rcu_qs(bool preempt);
+void tasks_rcu_qs_enter(struct task_struct *t, bool preempt);
+void tasks_rcu_qs_exit(struct task_struct *t);
+void synchronize_rcu_tasks(void);
+#else /* #ifdef CONFIG_TASKS_RCU */
+static inline void tasks_rcu_qs(bool preempt) { }
+static inline void tasks_rcu_qs_enter(struct task_struct *t, bool preempt) { }
+static inline void tasks_rcu_qs_exit(struct task_struct *t) { }
+static inline void synchronize_rcu_tasks(void) { synchronize_sched(); }
+#endif /* #else #ifdef CONFIG_TASKS_RCU */
+
 /**
  * cond_resched_rcu_qs - Report potential quiescent states to RCU
  *
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3b7d9aa80349..1e5d27d1e885 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -544,6 +544,11 @@ struct task_struct {
 	struct rcu_node			*rcu_blocked_node;
 #endif /* #ifdef CONFIG_PREEMPT_RCU */
 
+#ifdef CONFIG_TASKS_RCU
+	u8				rcu_tasks_idx;
+	u8				rcu_tasks_preempt;
+#endif /* #ifdef CONFIG_TASKS_RCU */
+
 	struct sched_info		sched_info;
 
 	struct list_head		tasks;
diff --git a/kernel/fork.c b/kernel/fork.c
index 1784f49b0cd4..8767d739eed4 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1498,6 +1498,10 @@ static inline void rcu_copy_process(struct task_struct *p)
 	p->rcu_blocked_node = NULL;
 	INIT_LIST_HEAD(&p->rcu_node_entry);
 #endif /* #ifdef CONFIG_PREEMPT_RCU */
+#ifdef CONFIG_TASKS_RCU
+	p->rcu_tasks_preempt = false;
+	tasks_rcu_qs_exit(p);
+#endif /* #ifdef CONFIG_TASKS_RCU */
 }
 
 /*
diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index be90c945063f..3077568930f3 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -69,13 +69,13 @@ config TREE_SRCU
 	  This option selects the full-fledged version of SRCU.
 
 config TASKS_RCU
-	bool
-	default n
+	def_bool PREEMPT_RCU
 	select SRCU
 	help
 	  This option enables a task-based RCU implementation that uses
-	  only voluntary context switch (not preemption!), idle, and
-	  user-mode execution as quiescent states.
+	  only voluntary context switch (not preemption!) and user-mode
+	  execution as quiescent states.  For !PREEMPT_RCU kernels,
+	  synchronize_tasks_rcu() maps to synchronize_sched().
 
 config RCU_STALL_COMMON
 	def_bool ( TREE_RCU || PREEMPT_RCU )
diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
index 0ec7d1d33a14..b6619ceedafc 100644
--- a/kernel/rcu/Kconfig.debug
+++ b/kernel/rcu/Kconfig.debug
@@ -16,7 +16,6 @@ config RCU_PERF_TEST
 	depends on DEBUG_KERNEL
 	select TORTURE_TEST
 	select SRCU
-	select TASKS_RCU
 	default n
 	help
 	  This option provides a kernel module that runs performance
@@ -33,7 +32,6 @@ config RCU_TORTURE_TEST
 	depends on DEBUG_KERNEL
 	select TORTURE_TEST
 	select SRCU
-	select TASKS_RCU
 	default n
 	help
 	  This option provides a kernel module that runs torture tests
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8008541b04f2..05e7abecde68 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -166,6 +166,7 @@ static void invoke_rcu_callbacks(struct rcu_state *rsp, struct rcu_data *rdp);
 static void rcu_report_exp_rdp(struct rcu_state *rsp,
 			       struct rcu_data *rdp, bool wake);
 static void sync_sched_exp_online_cleanup(int cpu);
+static void rcu_all_qs_ctxt(bool ctxt);
 
 /* rcuc/rcub kthread realtime priority */
 static int kthread_prio = IS_ENABLED(CONFIG_RCU_BOOST) ? 1 : 0;
@@ -467,7 +468,7 @@ void rcu_note_context_switch(bool preempt)
 		rcu_momentary_dyntick_idle();
 	this_cpu_inc(rcu_dynticks.rcu_qs_ctr);
 	if (!preempt)
-		rcu_all_qs();
+		rcu_all_qs_ctxt(true);
 out:
 	trace_rcu_utilization(TPS("End context switch"));
 	barrier(); /* Avoid RCU read-side critical sections leaking up. */
@@ -480,14 +481,13 @@ EXPORT_SYMBOL_GPL(rcu_note_context_switch);
  * dyntick-idle quiescent state visible to other CPUs (but only for those
  * RCU flavors in desperate need of a quiescent state, which will normally
  * be none of them).  Either way, do a lightweight quiescent state for
- * all RCU flavors.
+ * all RCU flavors.  If ctxt is false, also to tasks-RCU quiescent state.
  *
  * The barrier() calls are redundant in the common case when this is
  * called externally, but just in case this is called from within this
  * file.
- *
  */
-void rcu_all_qs(void)
+static void rcu_all_qs_ctxt(bool ctxt)
 {
 	unsigned long flags;
 
@@ -509,9 +509,16 @@ void rcu_all_qs(void)
 	if (unlikely(raw_cpu_read(rcu_sched_data.cpu_no_qs.b.exp)))
 		rcu_sched_qs();
 	this_cpu_inc(rcu_dynticks.rcu_qs_ctr);
+	if (!ctxt)
+		tasks_rcu_qs(false);
 	barrier(); /* Avoid RCU read-side critical sections leaking up. */
 	preempt_enable();
 }
+
+void rcu_all_qs(void)
+{
+	rcu_all_qs_ctxt(false);
+}
 EXPORT_SYMBOL_GPL(rcu_all_qs);
 
 #define DEFAULT_RCU_BLIMIT 10     /* Maximum callbacks per rcu_do_batch. */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index d7325553cf82..21b254dd28b8 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -783,6 +783,7 @@ void exit_rcu(void)
 	barrier();
 	t->rcu_read_unlock_special.b.blocked = true;
 	__rcu_read_unlock();
+	tasks_rcu_qs_enter(t, false);
 }
 
 #else /* #ifdef CONFIG_PREEMPT_RCU */
@@ -2597,3 +2598,57 @@ static void rcu_bind_gp_kthread(void)
 		return;
 	housekeeping_affine(current);
 }
+
+#ifdef CONFIG_TASKS_RCU
+
+DEFINE_STATIC_SRCU(tasks_srcu);
+
+/* Record a momentary RCU-tasks quiescent state. */
+void tasks_rcu_qs(bool preempt)
+{
+	int idx;
+
+	if (preempt || is_idle_task(current))
+		return;  /* Preempted, not a quiescent state. */
+	/* Acquire new before releasing old to allow tracing SRCU. */
+	preempt_disable();
+	idx = __srcu_read_lock(&tasks_srcu);
+	__srcu_read_unlock(&tasks_srcu, current->rcu_tasks_idx);
+	current->rcu_tasks_idx = idx;
+	preempt_enable();
+}
+
+/* Record entering an extended RCU-tasks quiescent state. */
+void tasks_rcu_qs_enter(struct task_struct *t, bool preempt)
+{
+	t->rcu_tasks_preempt = preempt;
+	if (preempt || is_idle_task(t))
+		return;  /* Preempted, not a quiescent state. */
+	preempt_disable();
+	__srcu_read_unlock(&tasks_srcu, t->rcu_tasks_idx);
+	preempt_enable();
+}
+
+/* Record exiting an extended RCU-tasks quiescent state. */
+void tasks_rcu_qs_exit(struct task_struct *t)
+{
+	if (t->rcu_tasks_preempt || is_idle_task(t))
+		return;  /* Preempted, not a quiescent state. */
+	preempt_disable();
+	t->rcu_tasks_idx = __srcu_read_lock(&tasks_srcu);
+	preempt_enable();
+}
+
+/**
+ * synchronize_rcu_tasks - Wait for all tasks to voluntarily leave the kernel
+ *
+ * Waits until each task is either voluntarily blocked, running in
+ * userspace, or has offered to block (as in cond_resched_rcu_qs()).
+ * Idle tasks are ignored.
+ */
+void synchronize_rcu_tasks(void)
+{
+	synchronize_srcu(&tasks_srcu);
+}
+
+#endif /* #ifdef CONFIG_TASKS_RCU */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 75353aa53ae3..1ba79734377f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3430,6 +3430,7 @@ static void __sched notrace __schedule(bool preempt)
 	clear_preempt_need_resched();
 
 	if (likely(prev != next)) {
+		tasks_rcu_qs_exit(next);
 		rq->nr_switches++;
 		rq->curr = next;
 		++*switch_count;
@@ -3438,9 +3439,11 @@ static void __sched notrace __schedule(bool preempt)
 
 		/* Also unlocks the rq: */
 		rq = context_switch(rq, prev, next, &rf);
+		tasks_rcu_qs_enter(prev, preempt);
 	} else {
 		rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
 		rq_unlock_irq(rq, &rf);
+		tasks_rcu_qs(preempt);
 	}
 
 	balance_callback(rq);

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ