linux-kernel - Re: [PATCH v4 0/6] sched: use runnable load based balance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130502103508.GA13196@dyad.programming.kicks-ass.net>
Date:	Thu, 2 May 2013 12:35:08 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Alex Shi <alex.shi@...el.com>
Cc:	mingo@...hat.com, tglx@...utronix.de, akpm@...ux-foundation.org,
	arjan@...ux.intel.com, bp@...en8.de, pjt@...gle.com,
	namhyung@...nel.org, efault@....de, morten.rasmussen@....com,
	vincent.guittot@...aro.org, gregkh@...uxfoundation.org,
	preeti@...ux.vnet.ibm.com, viresh.kumar@...aro.org,
	linux-kernel@...r.kernel.org, len.brown@...el.com,
	rafael.j.wysocki@...el.com, jkosina@...e.cz,
	clark.williams@...il.com, tony.luck@...el.com,
	keescook@...omium.org, mgorman@...e.de, riel@...hat.com
Subject: Re: [PATCH v4 0/6] sched: use runnable load based balance

On Thu, May 02, 2013 at 08:38:17AM +0800, Alex Shi wrote:

> On 3.8 kernel, the first problem commit is 5a505085f043 ("mm/rmap:
> Convert the struct anon_vma::mutex to an rwsem"), It cause much
> imbalance among cpus on aim7 benchmark. The reason is here,
> https://lkml.org/lkml/2013/1/29/84.

Hehe. yeah that one was going to be obvious.. rwsems were known to be totally
bad performers. Not only were they lacking lock stealing but also the spinning.
 
> But, after we use rwsem lock stealing in 3.9 kernel, commit
> ce6711f3d196f09ca0e, aim7 wakeup won't has clear imbalance issue. And
> then aim7 won't need this extra burst wakeup detection.

OK, that seems like a nice fix for rwsems.. one nit:

+               raw_spin_lock_irq(&sem->wait_lock);
+               /* Try to get the writer sem, may steal from the head writer: */
+               if (flags == RWSEM_WAITING_FOR_WRITE)
+                       if (try_get_writer_sem(sem, &waiter)) {
+                               raw_spin_unlock_irq(&sem->wait_lock);
+                               return sem;
+                       }
+               raw_spin_unlock_irq(&sem->wait_lock);
                schedule();

That should probably look like:

	preempt_disable();
	raw_spin_unlock_irq();
	preempt_enable_no_resched();
	schedule();

Otherwise you might find a performance regression on PREEMPT=y kernels.

> With PJT's patch, we know in first few seconds from wakeup, task's
> runnable load maybe nearly zero. The nearly zero runnable load increases
> the imbalance among cpus. So there are about extra 5% performance drop
> when use runnable load to do balance on aim7(in testing, aim7 will fork
> 2000 threads, and then wait for a trigger, then run as fast as possible).

OK, so what I was asking after is if you changed the scheduler after PJTs
patches landed to deal with this bulk wakeup. Also while aim7 might no longer
trigger the bad pattern what is to say nothing ever will? In particular
anything using pthread_cond_broadcast() is known to be suspect of bulk wakeups.

Anyway, I'll go try and make sense of some of the actual patches.. :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/