lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 14 Jul 2016 18:32:11 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Tejun Heo <tj@...nel.org>
Cc:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	John Stultz <john.stultz@...aro.org>,
	Ingo Molnar <mingo@...hat.com>,
	lkml <linux-kernel@...r.kernel.org>,
	Dmitry Shmidt <dimitrysh@...gle.com>,
	Rom Lemarchand <romlem@...gle.com>,
	Colin Cross <ccross@...gle.com>, Todd Kjos <tkjos@...gle.com>,
	Oleg Nesterov <oleg@...hat.com>
Subject: Re: Severe performance regression w/ 4.4+ on Android due to cgroup
 locking changes

On Thu, Jul 14, 2016 at 11:07:15AM -0400, Tejun Heo wrote:
> How?  While write lock is pending, no new reader is allowed.

Look at the new percpu_down_write (the old one is similar in concept):

+ void percpu_down_write(struct percpu_rw_semaphore *sem)
+ {
+ 	down_write(&sem->rw_sem);
+ 
+ 	/* Notify readers to take the slow path. */
+ 	rcu_sync_enter(&sem->rss);
+ 
+ 	/*
+ 	 * Notify new readers to block; up until now, and thus throughout the
+ 	 * longish rcu_sync_enter() above, new readers could still come in.
+ 	 */
+ 	WRITE_ONCE(sem->state, readers_block);

Note how up until this point new readers could happen? So the 'entire'
synchronize_sched() call is 'invisible' to new readers.

We need the sync_sched() to ensure all new down_read callers will go
through the 'slow' down_read path and even look at sem->state.

+ 
+ 	smp_mb(); /* D matches A */
+ 
+ 	/*
+ 	 * If they don't see our writer of readers_block to sem->state,
+ 	 * then we are guaranteed to see their sem->refcount increment, and
+ 	 * therefore will wait for them.
+ 	 */
+ 
+ 	/* Wait for all now active readers to complete. */
+ 	wait_event(sem->writer, readers_active_check(sem));
+ }

> If reader ops are high frequency, they will surely get affected. 

Before the sync_sched() a down_read (PREEMPT=n, LOCKDEP=n) is basically:

  __this_cpu_inc(*sem->refcount)
  if (sem->rss->gp_state) /* false */
	;

This is one purely local (!atomic) RmW and one load of a shared
variable. Absolute minimal overhead.

During sync_sched() gp_state is !0 and we end up doing:

  __this_cpu_inc(*sem->refcount)
  if (sem->rss->gp_state) {
	smp_mb();
	if (sem->state != readers_block) /* true */
		return;
  }

Which is 1 smp_mb() and 1 shared state load (to the same cacheline we
touched before IIRC) more. Now full barriers are really rather
expensive, and do show up on profiles.

(and this is where the old and new code differ, the old code would end
up doing: __down_read(&sem->rwsem), which is a shared atomic RmW and is
_much_ more expensive).

> It just isn't a good design to inject RCU grace period synchronously
> into a hot path.

That's just the point, the write side of a _global_ lock can never, per
definition, be a hot path.

Now, I realize not everyone wants these same tradeoffs and I did you
this patch to allow that.

But I really think that this Android usecase invalidates the premise of
cgroups using a global lock.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ