linux-kernel - Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace signal_struct->group_rwsem with a global percpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150915135354.GA2905@mtj.duckdns.org>
Date:	Tue, 15 Sep 2015 09:53:54 -0400
From:	Tejun Heo <tj@...nel.org>
To:	Christian Borntraeger <borntraeger@...ibm.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...hat.com>,
	"linux-kernel@...r.kernel.org >> Linux Kernel Mailing List" 
	<linux-kernel@...r.kernel.org>, KVM list <kvm@...r.kernel.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Paul McKenney <paulmck@...ux.vnet.ibm.com>
Subject: Re: [4.2] commit d59cfc09c32 (sched, cgroup: replace
 signal_struct->group_rwsem with a global percpu_rwsem) causes regression for
 libvirt/kvm

Hello,

On Tue, Sep 15, 2015 at 03:36:34PM +0200, Christian Borntraeger wrote:
> >> The problem seems to be that the newly used percpu_rwsem does a
> >> rcu_synchronize_sched_expedited for all write downs/ups.
> > 
> > Can you try:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git dev.2015.09.11ab
> 
> yes, dev.2015.09.11a seems to help, thanks. Getting rid of the expedited hammer was
> really helpful - I guess.

Ah, that's nice.  I mentioned this in the original patchset but
percpu_rwsem as previously implemented could be too heavy on the
writer side for this path and I was planning to implement rwsem based
lglock if this blows up.  That said, if Oleg's changes makes the issue
go away, all the better.

> > those include Oleg's rework of the percpu rwsem which should hopefully
> > improve things somewhat.
> > 
> > But yes, pounding a global lock on a big machine will always suck.
> 
> By hacking out the fast path I actually degraded percpu rwsem to a real global lock, but
> things were still a lot faster. 
> I am wondering why the old code behaved in such fatal ways. Is there some interaction 
> between waiting for a reschedule in the synchronize_sched writer and some fork code 
> actually waiting for the read side to get the lock together with some rescheduling going
> on waiting for a lock that fork holds? lockdep does not give me an hints so I have no clue :-(

percpu_rwsem is a global lock.  My rough suspicion is that probably
the writer locking path was taking too long (especially if the kernel
has preemption disabled) making the writers getting backed up badly
starving the readers.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/