lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100819075543.GA4125@spritzera.linux.bs1.fc.nec.co.jp>
Date:	Thu, 19 Aug 2010 16:55:43 +0900
From:	Naoya Horiguchi <n-horiguchi@...jp.nec.com>
To:	Wu Fengguang <fengguang.wu@...el.com>
Cc:	Andi Kleen <andi@...stfloor.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Christoph Lameter <cl@...ux-foundation.org>,
	Mel Gorman <mel@....ul.ie>,
	"Jun'ichi Nomura" <j-nomura@...jp.nec.com>,
	linux-mm <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/9] HWPOISON, hugetlb: move PG_HWPoison bit check

On Wed, Aug 18, 2010 at 08:18:42AM +0800, Wu Fengguang wrote:
> On Tue, Aug 10, 2010 at 05:27:36PM +0800, Naoya Horiguchi wrote:
> > In order to handle metadatum correctly, we should check whether the hugepage
> > we are going to access is HWPOISONed *before* incrementing mapcount,
> > adding the hugepage into pagecache or constructing anon_vma.
> > This patch also adds retry code when there is a race between
> > alloc_huge_page() and memory failure.
> 
> This duplicates the PageHWPoison() test into 3 places without really
> address any problem. For example, there are still _unavoidable_ races
> between PageHWPoison() and add_to_page_cache().
> 
> What's the problem you are trying to resolve here? If there are
> data structure corruption, we may need to do it in some other ways.

The problem I tried to resolve in this patch is the corruption of
data structures when memory failure occurs between alloc_huge_page()
and lock_page().
The corruption occurs because page fault can fail with metadata changes
remained (such as refcount, mapcount, etc.) 
Since the PageHWPoison() check is for avoiding hwpoisoned page remained
in pagecache mapping to the process, it should be done in
"found in pagecache" branch, not in the common path.
This patch moves the check to "found in pagecache" branch.

In addition to that, I added 2 PageHWPoison checks in "new allocation" branches
to enhance the possiblity to recover from memory failures on pages under allocation.
But it's a different point from the original one, so I drop these retry checks.

Thanks,
Naoya Horiguchi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ