linux-kernel - Re: Severe performance regression w/ 4.4+ on Android due to cgroup locking changes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 14 Jul 2016 14:11:01 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Tejun Heo <tj@...nel.org>
Cc:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	John Stultz <john.stultz@...aro.org>,
	Ingo Molnar <mingo@...hat.com>,
	lkml <linux-kernel@...r.kernel.org>,
	Dmitry Shmidt <dimitrysh@...gle.com>,
	Rom Lemarchand <romlem@...gle.com>,
	Colin Cross <ccross@...gle.com>, Todd Kjos <tkjos@...gle.com>,
	Oleg Nesterov <oleg@...hat.com>
Subject: Re: Severe performance regression w/ 4.4+ on Android due to cgroup
 locking changes

On Thu, Jul 14, 2016 at 07:20:46AM -0400, Tejun Heo wrote:
> On Thu, Jul 14, 2016 at 08:49:56AM +0200, Peter Zijlstra wrote:

> > So the immediate problem with lg style locks is that the 'local' lock
> > will not stay local since these are preemptible locks we can get
> > migrations etc..
> > 
> > All fixable, but still.
> 
> In this case, the locks are read-locked only across operations which
> change process hierarchy.  They'll occasionally get migrated while
> holding the lock for sure but not often enough to matter.

Means having to change the interface to pass along what 'local' is, like
srcu_read_lock().

> > So the main objection I have is that this isn't a fundamental fix, this
> > only cures things because Android only runs on small machines.
> >
> > If someone with a big computer tries to do the same things we're up some
> > creek without no paddle. There's just no way we can make a global writer
> > 'fast'.
> 
> How so?  As the number of cores increases, it'll get proportionally
> more expensive as the same operation is performed on more CPUs;
> however, the latency is dependent on the slowest one and it'll get
> higher more often with more number of CPUs but not drastically.

A global lock on 4 or 8 socket machines with all 200+ cpus trying to
use it really stinks.

Remember, they switch cgroups at really rather high rates here, because
of that binder stuff. I don't see how you can defend a global lock here
:/ Global locks only work when writers are extremely rare, and clearly
that premise is false.

Also note that since these are preemptible locks, you can get unbounded
priority inversions.