linux-kernel - Re: INFO: rcu_sched detected stalls on CPUs/tasks with `kswapd` and `mem_cgroup_shrink

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20161130170249.GZ3924@linux.vnet.ibm.com>
Date:   Wed, 30 Nov 2016 09:02:49 -0800
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Michal Hocko <mhocko@...nel.org>,
        Donald Buczek <buczek@...gen.mpg.de>,
        Paul Menzel <pmenzel@...gen.mpg.de>, dvteam@...gen.mpg.de,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Josh Triplett <josh@...htriplett.org>
Subject: Re: INFO: rcu_sched detected stalls on CPUs/tasks with `kswapd` and
 `mem_cgroup_shrink_node`

On Wed, Nov 30, 2016 at 05:38:20PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 30, 2016 at 06:29:55AM -0800, Paul E. McKenney wrote:
> > We can, and you are correct that cond_resched() does not unconditionally
> > supply RCU quiescent states, and never has.  Last time I tried to add
> > cond_resched_rcu_qs() semantics to cond_resched(), I got told "no",
> > but perhaps it is time to try again.
> 
> Well, you got told: "ARRGH my benchmark goes all regress", or something
> along those lines. Didn't we recently dig out those commits for some
> reason or other?

Were "those commits" the benchmark or putting cond_resched_rcu_qs()
functionality into cond_resched()?  Either way, no idea.

> Finding out what benchmark that was and running it against this patch
> would make sense.

Agreed, especially given that I believe cond_resched_rcu_qs() is lighter
weight than it used to be.  No idea what benchmarks they were, though.

> Also, I seem to have missed, why are we going through this again?

People are running workloads that force long-running loops in the kernel,
which get them RCU CPU stall warning messages.  My reaction has been
to insert cond_resched_rcu_qs() as needed, and Michal wondered why
cond_resched() couldn't just handle both scheduling latency and RCU
quiescent states.  I remembered trying it, but not what the issue was.

So I posted the patch assuming that I would eventually either find out
what the issue was or that the issue no longer applied.  ;-)

							Thanx, Paul