linux-kernel - Re: KVM bug, git bisected

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100329222315.GQ3335@dastard>
Date:	Tue, 30 Mar 2010 09:23:15 +1100
From:	Dave Chinner <david@...morbit.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Rik van Riel <riel@...hat.com>,
	Kent Overstreet <kent.overstreet@...il.com>,
	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>
Subject: Re: KVM bug, git bisected

On Mon, Mar 29, 2010 at 06:29:00PM +0200, Peter Zijlstra wrote:
> On Mon, 2010-03-29 at 12:11 -0400, Rik van Riel wrote:
> > On 03/27/2010 08:43 AM, Kent Overstreet wrote:
> > > commit 5beb49305251e5669852ed541e8e2f2f7696c53e
> > > Author: Rik van Riel <riel@...hat.com>
> > > Date: Fri Mar 5 13:42:07 2010 -0800
> > >
> > > mm: change anon_vma linking to fix multi-process server scalability issue
> > >
> > > I get this when starting kvm. The warning hasn't caused me problems, but
> > > I've also been getting a scheduling while atomic panic when I start kvm
> > > that I can only reproduce when I don't want to. It's definitely config
> > > dependent, I'd guess preempt might have something to do with it.
> > 
> >  From your trace, it looks like mm_take_all_locks is taking close
> > to 256 locks, which is where the preempt_count could overflow into
> > the softirq count.
> > 
> > Since kvm-qemu is exec'd, I am guessing you have a very large
> > number of VMAs in your qemu process.  Is that correct?
> > 
> > Peter, would it be safe to increase PREEMPT_BITS to eg. 10?
> 
> Possibly, but who's to say the thing won't bloat to 65k at which it'll
> hit the vma limit, but even that can be grown beyond that.

This issue came up a few years ago w.r.t. the per-cpu
superblock counters in XFS which used one spinlock per CPU to be held at
synchronisation/rebalance time. A 256p machine would fall over doing
this, and there was great resistance to increasing the preempt count
field size.

Instead, I changed the spinlocks to use a bit in a flag word in the
per-cpu structure and used a test_and_set_bit() loop to emulate a
spinlock. Then by adding an external preempt_disable()/enable() for
the fast and slow paths, they ultimately behave like spinlocks but
without causing preempt count windup.

I'm not suggesting that this is the solution to the current problem
case, just indicating that we've been here before and that there are
ways of avoiding preempt count windup in the cases where lots of
critical areas need to be locked out simultaneously....

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/