linux-kernel - Re: INFO: rcu_sched detected stalls on CPUs/tasks with `kswapd` and `mem_cgroup_shrink

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20161201180953.GO3045@worktop.programming.kicks-ass.net>
Date:   Thu, 1 Dec 2016 19:09:53 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:     Michal Hocko <mhocko@...nel.org>,
        Donald Buczek <buczek@...gen.mpg.de>,
        Paul Menzel <pmenzel@...gen.mpg.de>, dvteam@...gen.mpg.de,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Josh Triplett <josh@...htriplett.org>
Subject: Re: INFO: rcu_sched detected stalls on CPUs/tasks with `kswapd` and
 `mem_cgroup_shrink_node`

On Thu, Dec 01, 2016 at 08:59:18AM -0800, Paul E. McKenney wrote:
> On Thu, Dec 01, 2016 at 05:36:14PM +0100, Peter Zijlstra wrote:
> > Well, with the above change cond_resched() is already sufficient, no?
> 
> Maybe.  Right now, cond_resched_rcu_qs() gets a quiescent state to
> the RCU core in less than one jiffy, with my other change, this becomes
> a handful of jiffies depending on HZ and NR_CPUS.  I expect this
> increase to a handful of jiffies to be a non-event.
> 
> After my upcoming patch, cond_resched() will get a quiescent state to
> the RCU core in about ten seconds.  While I am am not all that nervous
> about the increase from less than a jiffy to a handful of jiffies,
> increasing to ten seconds via cond_resched() does make me quite nervous.
> Past experience indicates that someone's kernel will likely be fatally
> inconvenienced by this magnitude of change.
> 
> Or am I misunderstanding what you are proposing?

No, that is indeed what I was proposing. Hurm.. OK let me ponder that a
bit. There might be a few games we can play with !PREEMPT to avoid IPIs.

Thing is, I'm slightly uncomfortable with de-coupling rcu-sched from
actual schedule() calls.

> > In fact, by doing the IPI thing we get the entire cond_resched*()
> > family, and we could add the should_resched() guard to
> > cond_resched_rcu().
> 
> So that cond_resched_rcu_qs() looks something like this, in order
> to avoid the function call in the case where the scheduler has nothing
> to do?

I was actually thinking of this:

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2d0c82e1d348..2dc7d8056b2a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -3374,9 +3374,11 @@ static inline int signal_pending_state(long state, struct task_struct *p)
 static inline void cond_resched_rcu(void)
 {
 #if defined(CONFIG_DEBUG_ATOMIC_SLEEP) || !defined(CONFIG_PREEMPT_RCU)
-	rcu_read_unlock();
-	cond_resched();
-	rcu_read_lock();
+	if (should_resched(1)) {
+		rcu_read_unlock();
+		cond_resched();
+		rcu_read_lock();
+	}
 #endif
 }