lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100419114300.GT19264@csn.ul.ie>
Date:	Mon, 19 Apr 2010 12:43:00 +0100
From:	Mel Gorman <mel@....ul.ie>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	r6144 <rainy6144@...il.com>, linux-kernel@...r.kernel.org,
	Darren Hart <dvhltc@...ibm.com>, tglx <tglx@...utronix.de>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Lee Schermerhorn <lee.schermerhorn@...com>
Subject: Re: Process-shared futexes on hugepages puts the kernel in an
	infinite loop in 2.6.32.11; is this fixed now?

On Fri, Apr 16, 2010 at 10:27:48PM +0200, Peter Zijlstra wrote:
> On Fri, 2010-04-16 at 23:45 +0800, r6144 wrote:
> > Hello all,
> > 
> > I'm having an annoying kernel bug regarding huge pages in Fedora 12:
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=552257
> > 
> > Basically I want to use huge pages in a multithreaded number crunching
> > program, which happens to use process-shared semaphores (because fftw
> > does it).  The futex for the semaphore ends up lying on a huge page, and
> > I then get an endless loop in get_futex_key(), apparently because the
> > anonymous huge page containing the futex does not have a page->mapping.
> > A test case is provided in the above link.
> > 
> > I reported the bug to Fedora bugzilla months ago, but haven't received
> > any feedback yet. 
> 
> No, it works much better if you simply mail LKML and CC people who work
> on the code in question ;-)
> 
> >  The Fedora kernel is based on 2.6.32.11, and a
> > cursory glance at the 2.6.34-rc3 source does not yield any relevant
> > change.
> > 
> > So, could anyone tell me if the current mainline kernel might act better
> > in this respect, before I get around to compiling it?
> 
> Right, so I had a quick chat with Mel, and it appears MAP_PRIVATE
> hugetlb pages don't have their page->mapping set.
> 
> I guess something like the below might work, but I'd really rather not
> add hugetlb knowledge to futex.c. Does anybody else have a better idea?
> Maybe create something similar to an anon_vma for hugetlb pages?
> 

anon_vma for hugetlb pages sounds overkill, what would it gain? In this
context, futex only appears to distinguish between whether the
references are private or shared.

Looking at the hugetlbfs code, I can't see a place where it actually cares
about the mapping as such. It's used to find shared pages in the page cache
(but not in the LRU) that are backed by the hugetlbfs file. For hugetlbfs
though, the mapping is mostly kept in page->private for reservation accounting
purposes.

I can't think of other parts of the VM that touch the mapping if the
page is managed by hugetlbfs so the following patch should also work but
without futex having hugetlbfs-awareness. What do you think? Maybe for
safety, it would be better to make the mapping some obvious poison bytes
or'd with PAGE_MAPPING_ANON so an oops will be more obvious?

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6034dc9..57a5faa 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -546,6 +546,7 @@ static void free_huge_page(struct page *page)
 
 	mapping = (struct address_space *) page_private(page);
 	set_page_private(page, 0);
+	page->mapping = NULL;
 	BUG_ON(page_count(page));
 	INIT_LIST_HEAD(&page->lru);
 
@@ -2447,8 +2448,10 @@ retry:
 			spin_lock(&inode->i_lock);
 			inode->i_blocks += blocks_per_huge_page(h);
 			spin_unlock(&inode->i_lock);
-		} else
+		} else {
 			lock_page(page);
+			page->mapping = (struct address_space *)PAGE_MAPPING_ANON;
+		}
 	}
 
 	/*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ