linux-kernel - Re: Severe performance regression w/ 4.4+ on Android due to cgroup locking changes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160714164547.GG30154@twins.programming.kicks-ass.net>
Date:	Thu, 14 Jul 2016 18:45:47 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:	Tejun Heo <tj@...nel.org>, John Stultz <john.stultz@...aro.org>,
	Ingo Molnar <mingo@...hat.com>,
	lkml <linux-kernel@...r.kernel.org>,
	Dmitry Shmidt <dimitrysh@...gle.com>,
	Rom Lemarchand <romlem@...gle.com>,
	Colin Cross <ccross@...gle.com>, Todd Kjos <tkjos@...gle.com>,
	Oleg Nesterov <oleg@...hat.com>
Subject: Re: Severe performance regression w/ 4.4+ on Android due to cgroup
 locking changes

On Thu, Jul 14, 2016 at 09:23:55AM -0700, Paul E. McKenney wrote:
> Hmmm...  How does this handle the following sequence of events for
> the case where we are not biased towards the reader?
> 
> o	The per-CPU rwsem is set up with RCU_NONE and readers_slow
> 	(as opposed to readers_block).  The rcu_sync ->gp_state is
> 	GP_PENDING, which means that rcu_sync_is_idle() will always
> 	return true.

/false/, rcu_sync_is_idle() will always be false, to force us into the
slowpath.

> o	Task A on CPU 0 runs percpu_down_read() to completion, and remains
> 	within the critical section.  CPU 0's ->refcount is therefore 1.
> 
> o	Task B on CPU 1 does percpu_down_write(), which write-acquires
> 	->rw_sem, does rcu_sync_enter() (which is a no-op due to
> 	RCU_NONE), sets ->state to readers_block, and is just now going
> 	to wait for readers, which first invokes readers_active_check(),
> 	which starts summing the ->refcount for CPUs 0, 1, and 2,
> 	obtaining the value 1 so far.
> 
> o	Task C CPU 2 enters percpu_down_read(), disables preemption,
> 	increments CPU 2's ->refcount, sees rcu_sync_is_idle() return
> 	true (so skips __percpu_down_read()), enables preemption, and
> 	enters its critical section.

false, so does __percpu_down_read()

> 
> o	Task C migrates to CPU 3 and invokes percpu_up_read(), which
> 	disables preemption, sees rcu_sync_is_idle() return true, calls
> 	__this_cpu_dec() on CPU 3's ->refcount, and enables preemption.
> 	The value of CPU 3's ->refcount is thus (unsigned int)-1.

__percpu_up_read()

> 
> o	Task B on CPU 1 continues execution in readers_active_check(), with
> 	the full sum being zero.
> 
> So it looks to me like we have Task A as a writer at the same time that
> Task A is a reader, which would not be so good.
> 
> So what am I missing here?

for RCU_NONE we init rsp->gp_state to !0, which makes:

static inline rcu_sync_is_idle()'s

	return !rsp->gp_state (aka. rsp->gp_state == 0)

return false.

> And a couple of checkpatch nits below.  Yes, I had to apply the patch to
> figure out what it was doing.  ;-)

Yah, too much churn to read :-)

In any case, rest assured you've already gone over this part of the
patch several times. I repurposed an old percpu-rwsem optimization, Oleg
recognised it.