linux-kernel - Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 15 Sep 2015 15:36:34 +0200
From:	Christian Borntraeger <borntraeger@...ibm.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Tejun Heo <tj@...nel.org>, Ingo Molnar <mingo@...hat.com>,
	"linux-kernel@...r.kernel.org >> Linux Kernel Mailing List" 
	<linux-kernel@...r.kernel.org>, KVM list <kvm@...r.kernel.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Paul McKenney <paulmck@...ux.vnet.ibm.com>
Subject: Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace
 signal_struct->group_rwsem with a global percpu_rwsem) causes regression for
 libvirt/kvm

Am 15.09.2015 um 15:05 schrieb Peter Zijlstra:
> On Tue, Sep 15, 2015 at 02:05:14PM +0200, Christian Borntraeger wrote:
>> Tejun,
>>
>>
>> commit d59cfc09c32a2ae31f1c3bc2983a0cd79afb3f14 (sched, cgroup: replace 
>> signal_struct->group_rwsem with a global percpu_rwsem) causes some noticably
>> hickups when starting several kvm guests (which libvirt will move into cgroups
>> - each vcpu thread and each i/o thread)
>> When you now start lots of guests in parallel on a bigger system (32CPUs with
>> 2way smt in my case) the system is so busy that systemd runs into several timeouts
>> like "Did not receive a reply. Possible causes include: the remote application did
>> not send a reply, the message bus security policy blocked the reply, the reply
>> timeout expired, or the network connection was broken."
>>
>> The problem seems to be that the newly used percpu_rwsem does a
>> rcu_synchronize_sched_expedited for all write downs/ups.
> 
> Can you try:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev.2015.09.11ab

yes, dev.2015.09.11a seems to help, thanks. Getting rid of the expedited hammer was
really helpful - I guess.

> 
> those include Oleg's rework of the percpu rwsem which should hopefully
> improve things somewhat.
> 
> But yes, pounding a global lock on a big machine will always suck.

By hacking out the fast path I actually degraded percpu rwsem to a real global lock, but
things were still a lot faster. 
I am wondering why the old code behaved in such fatal ways. Is there some interaction 
between waiting for a reschedule in the synchronize_sched writer and some fork code 
actually waiting for the read side to get the lock together with some rescheduling going
on waiting for a lock that fork holds? lockdep does not give me an hints so I have no clue :-(


Christian

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/