linux-kernel - Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen pv guest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 22 Sep 2011 14:32:32 -0400
From:	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
To:	"Christopher S. Aker" <caker@...shore.net>,
	Shaun R <mailinglists@...x-scripts.com>,
	Jeremy Fitzhardinge <jeremy@...p.org>
Cc:	Ian Campbell <Ian.Campbell@...rix.com>,
	"xen-devel@...ts.xensource.com" <xen-devel@...ts.xensource.com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [Xen-devel] Re: kernel BUG at mm/swapfile.c:2527! [was 3.0.0 Xen
 pv guest - BUG: Unable to handle]

> >I'd bet the dereference corresponds to the "*map" in that same place but
> >Peter can you convert that address to a line of code please?
> 
> root@...ld:/build/xen/domU/i386/3.0.0-linode35-debug# gdb vmlinux
> GNU gdb (GDB) 7.1-ubuntu (...snip...)
> Reading symbols from
> /build/xen/domU/i386/3.0.0-linode35-debug/vmlinux...done.
> (gdb) list *0xc01ab854
> 0xc01ab854 is in swap_count_continued (mm/swapfile.c:2493).
> 2488
> 2489            if (count == (SWAP_MAP_MAX | COUNT_CONTINUED)) { /*
> incrementing */
> 2490                    /*
> 2491                     * Think of how you add 1 to 999
> 2492                     */
> 2493                    while (*map == (SWAP_CONT_MAX | COUNT_CONTINUED)) {
> 2494                            kunmap_atomic(map, KM_USER0);
> 2495                            page = list_entry(page->lru.next,
> struct page, lru);
> 2496                            BUG_ON(page == head);
> 2497                            map = kmap_atomic(page, KM_USER0) + offset;
> (gdb)
> 
> >map came from a kmap_atomic() not far before this point so it appears
> >that it is mapping the wrong page (so *map != 0) and/or mapping a
> >non-existent page (leading to the fault).

First of thanks to Jeremy for help on this one, and Shaun R for lending
me one of his boxes with a environment to easily test it.

The problem looks that in copy_page_range we turn lazy mode on, and then
in swap_entry_free we call swap_count_continued which ends up in:

         map = kmap_atomic(page, KM_USER0) + offset;

and then later on touching *map.

Basically we are forking a process and copying the pages that are also
"swap" pages. We don't need to access the user pages immediately, but we
do for the swap pages as we need proper reference counting.

Well, since we are running in batched mode we don't actually set up the
PTE mappings and the kmap_atomic is not done synchronously and ends up
trying to dereference a page that has not been set.

Looking at kmap_atomic_prot_pfn, it uses 'arch_flush_lazy_mmu_mode' and
sprinkling that in kmap_atomic_prot and __kunmap_atomic seems to make
the problem go away.

This is the patch that looks to be doing the trick. Please double check
if it fixes in your guys setup.


diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c
index b499626..f4f29b1 100644
--- a/arch/x86/mm/highmem_32.c
+++ b/arch/x86/mm/highmem_32.c
@@ -45,6 +45,7 @@ void *kmap_atomic_prot(struct page *page, pgprot_t prot)
 	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
 	BUG_ON(!pte_none(*(kmap_pte-idx)));
 	set_pte(kmap_pte-idx, mk_pte(page, prot));
+	arch_flush_lazy_mmu_mode();
 
 	return (void *)vaddr;
 }
@@ -88,6 +89,7 @@ void __kunmap_atomic(void *kvaddr)
 		 */
 		kpte_clear_flush(kmap_pte-idx, vaddr);
 		kmap_atomic_idx_pop();
+		arch_flush_lazy_mmu_mode();
 	}
 #ifdef CONFIG_DEBUG_HIGHMEM
 	else {



View attachment "flush.patch" of type "text/x-diff" (623 bytes)