[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20120423190915.GF19117@1wt.eu>
Date: Mon, 23 Apr 2012 21:09:15 +0200
From: Willy Tarreau <w@....eu>
To: Philipp Hahn <hahn@...vention.de>
Cc: stable@...r.kernel.org, Andrea Arcangeli <aarcange@...hat.com>,
Ingo Molnar <mingo@...e.hu>,
Jeremy Fitzhardinge <jeremy@...p.org>,
Peter Zijlstra <peterz@...radead.org>,
the arch/x86 maintainers <x86@...nel.org>,
Hugh Dickins <hughd@...gle.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Jan Beulich <JBeulich@...ell.com>, Andi Kleen <ak@...e.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Johannes Weiner <jweiner@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>,
Thomas Gleixner <tglx@...utronix.de>,
Larry Woodman <lwoodman@...hat.com>,
Rik van Riel <riel@...hat.com>,
Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
669335@...s.debian.org
Subject: Re: [2.6.32.y][PATCH] fix pgd_lock deadlock
Hello Philipp,
On Mon, Apr 23, 2012 at 11:07:53AM +0200, Philipp Hahn wrote:
> Hello,
>
> On Wednesday 16 February 2011 15:49:47 Andrea Arcangeli wrote:
> > Subject: fix pgd_lock deadlock
> >
> > From: Andrea Arcangeli <aarcange@...hat.com>
> >
> > It's forbidden to take the page_table_lock with the irq disabled or if
> > there's contention the IPIs (for tlb flushes) sent with the page_table_lock
> > held will never run leading to a deadlock.
> >
> > Apparently nobody takes the pgd_lock from irq so the _irqsave can be
> > removed.
> >
> > Signed-off-by: Andrea Arcangeli <aarcange@...hat.com>
>
> This patch (original commit Id for 2.6.38
> a79e53d85683c6dd9f99c90511028adc2043031f) needs to be back-ported to 2.6.32.x
> as well.
> I observed a dead-lock problem when running a PAE enabled Debian 2.6.32.46+
> kernel with 6 VCPUs as a KVM on (2.6.32, 3.2, 3.3) kernel, which showed the
> following behaviour:
>
> 1 VCPU is stuck in
> pgd_alloc() ??? pgd_prepopulate_pmb() ???... ??? flush_tlb_others_ipi()
> while (!cpumask_empty(to_cpumask(f->flush_cpumask)))
> cpu_relax();
> (gdb) print f->flush_cpumask
> $5 = {1}
>
> while all other VCPUs are stuck in
> pgd_alloc() ??? spin_lock_irqsave(pgd_lock)
>
> I tracked it down to the commit
> 2.6.39-rc1: 4981d01eada5354d81c8929d5b2836829ba3df7b
> 2.6.32.34: ba456fd7ec1bdc31a4ad4a6bd02802dcaa730a33
> x86: Flush TLB if PGD entry is changed in i386 PAE mode
> which when reverted made the bug disappear.
>
> Comparing 3.2 to 2.6.32.34 showed that the 'pgd-deadlock'-patch went into
> 2.6.38, that is before the 'PAE correctness'-patch, so the problem was
> probably never observed in the main development branch.
> But for 2.6.32 the 'pgd-deadlock' patch is still missing, so the 'PAE
> corretness'-patch made the problem worse with 2.6.32.
>
> The Patch was also back-ported to the OpenSUSE Kernel
> <http://kernel.opensuse.org/cgit/kernel-source/commit/?id=ac27c01aa880c65d17043ab87249c613ac4c3635>,
> Since the patch didn't apply cleanly on the current Debian kernel, I had to
> backport it for us and Debian. The patch is also available from our (German)
> Bugzilla <https://forge.univention.org/bugzilla/show_bug.cgi?id=26661> or
> from the Debian BTS at
> <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=669335>.
>
> I have no easy test case, but running multiple parallel builds inside the VM
> normally triggers the bug within seconds to minutes. With the patch applied
> the VM survived a night building packages without any problem.
>
> Signed-off-by: Philipp Hahn <hahn@...vention.de>
>
> Sincerely
> Philipp
Thank you, I'm queuing it for next 32-stable.
Regards,
Willy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists