[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <201204231107.59484.hahn@univention.de>
Date: Mon, 23 Apr 2012 11:07:53 +0200
From: Philipp Hahn <hahn@...vention.de>
To: stable@...r.kernel.org
Cc: Andrea Arcangeli <aarcange@...hat.com>,
Ingo Molnar <mingo@...e.hu>,
Jeremy Fitzhardinge <jeremy@...p.org>,
Peter Zijlstra <peterz@...radead.org>,
"the arch/x86 maintainers" <x86@...nel.org>,
Hugh Dickins <hughd@...gle.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Jan Beulich <JBeulich@...ell.com>, Andi Kleen <ak@...e.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Johannes Weiner <jweiner@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>,
Thomas Gleixner <tglx@...utronix.de>,
Larry Woodman <lwoodman@...hat.com>,
Rik van Riel <riel@...hat.com>,
Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
669335@...s.debian.org
Subject: [2.6.32.y][PATCH] fix pgd_lock deadlock
Hello,
On Wednesday 16 February 2011 15:49:47 Andrea Arcangeli wrote:
> Subject: fix pgd_lock deadlock
>
> From: Andrea Arcangeli <aarcange@...hat.com>
>
> It's forbidden to take the page_table_lock with the irq disabled or if
> there's contention the IPIs (for tlb flushes) sent with the page_table_lock
> held will never run leading to a deadlock.
>
> Apparently nobody takes the pgd_lock from irq so the _irqsave can be
> removed.
>
> Signed-off-by: Andrea Arcangeli <aarcange@...hat.com>
This patch (original commit Id for 2.6.38
a79e53d85683c6dd9f99c90511028adc2043031f) needs to be back-ported to 2.6.32.x
as well.
I observed a dead-lock problem when running a PAE enabled Debian 2.6.32.46+
kernel with 6 VCPUs as a KVM on (2.6.32, 3.2, 3.3) kernel, which showed the
following behaviour:
1 VCPU is stuck in
pgd_alloc() → pgd_prepopulate_pmb() →... → flush_tlb_others_ipi()
while (!cpumask_empty(to_cpumask(f->flush_cpumask)))
cpu_relax();
(gdb) print f->flush_cpumask
$5 = {1}
while all other VCPUs are stuck in
pgd_alloc() → spin_lock_irqsave(pgd_lock)
I tracked it down to the commit
2.6.39-rc1: 4981d01eada5354d81c8929d5b2836829ba3df7b
2.6.32.34: ba456fd7ec1bdc31a4ad4a6bd02802dcaa730a33
x86: Flush TLB if PGD entry is changed in i386 PAE mode
which when reverted made the bug disappear.
Comparing 3.2 to 2.6.32.34 showed that the 'pgd-deadlock'-patch went into
2.6.38, that is before the 'PAE correctness'-patch, so the problem was
probably never observed in the main development branch.
But for 2.6.32 the 'pgd-deadlock' patch is still missing, so the 'PAE
corretness'-patch made the problem worse with 2.6.32.
The Patch was also back-ported to the OpenSUSE Kernel
<http://kernel.opensuse.org/cgit/kernel-source/commit/?id=ac27c01aa880c65d17043ab87249c613ac4c3635>,
Since the patch didn't apply cleanly on the current Debian kernel, I had to
backport it for us and Debian. The patch is also available from our (German)
Bugzilla <https://forge.univention.org/bugzilla/show_bug.cgi?id=26661> or
from the Debian BTS at
<http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=669335>.
I have no easy test case, but running multiple parallel builds inside the VM
normally triggers the bug within seconds to minutes. With the patch applied
the VM survived a night building packages without any problem.
Signed-off-by: Philipp Hahn <hahn@...vention.de>
Sincerely
Philipp
--
Philipp Hahn Open Source Software Engineer hahn@...vention.de
Univention GmbH be open. fon: +49 421 22 232- 0
Mary-Somerville-Str.1 D-28359 Bremen fax: +49 421 22 232-99
http://www.univention.de/
View attachment "x86-mm-Fix-pgd_lock-deadlock.patch" of type "text/x-diff" (5666 bytes)
Download attachment "signature.asc " of type "application/pgp-signature" (198 bytes)
Powered by blists - more mailing lists