lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100824030133.GB12507@spritzera.linux.bs1.fc.nec.co.jp>
Date:	Tue, 24 Aug 2010 12:01:33 +0900
From:	Naoya Horiguchi <n-horiguchi@...jp.nec.com>
To:	Wu Fengguang <fengguang.wu@...el.com>
Cc:	Andi Kleen <andi@...stfloor.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Christoph Lameter <cl@...ux-foundation.org>,
	Mel Gorman <mel@....ul.ie>,
	"Jun'ichi Nomura" <j-nomura@...jp.nec.com>,
	linux-mm <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 9/9] hugetlb: add corrupted hugepage counter

On Thu, Aug 19, 2010 at 09:57:52AM +0800, Wu Fengguang wrote:
> > +void increment_corrupted_huge_page(struct page *page);
> > +void decrement_corrupted_huge_page(struct page *page);
>
> nitpick: increment/decrement are not verbs.

OK, increase/decrease are correct.


> > +void increment_corrupted_huge_page(struct page *hpage)
> > +{
> > +   struct hstate *h = page_hstate(hpage);
> > +   spin_lock(&hugetlb_lock);
> > +   h->corrupted_huge_pages++;
> > +   spin_unlock(&hugetlb_lock);
> > +}
> > +
> > +void decrement_corrupted_huge_page(struct page *hpage)
> > +{
> > +   struct hstate *h = page_hstate(hpage);
> > +   spin_lock(&hugetlb_lock);
> > +   BUG_ON(!h->corrupted_huge_pages);
>
> There is no point to have BUG_ON() here:
>
> /*
>  * Don't use BUG() or BUG_ON() unless there's really no way out; one
>  * example might be detecting data structure corruption in the middle
>  * of an operation that can't be backed out of.  If the (sub)system
>  * can somehow continue operating, perhaps with reduced functionality,
>  * it's probably not BUG-worthy.
>  *
>  * If you're tempted to BUG(), think again:  is completely giving up
>  * really the *only* solution?  There are usually better options, where
>  * users don't need to reboot ASAP and can mostly shut down cleanly.
>  */

OK. I understand.
BUG_ON() is too severe for just a counter.

>
> And there is a race case that (corrupted_huge_pages==0)!
> Suppose the user space calls unpoison_memory() on a good pfn, and the page
> happen to be hwpoisoned between lock_page() and TestClearPageHWPoison(),
> corrupted_huge_pages will go negative.

I see.
When this race happens, unpoison runs and decreases HugePages_Crpt,
but racing memory failure returns without increasing it.
Yes, this is a problem we need to fix.

Moreover for hugepage we should pay attention to the possiblity of
mce_bad_pages mismatch which can occur by race between unpoison and
multiple memory failures, where each failure increases mce_bad_pages
by the number of pages in a hugepage.

I think counting corrupted hugepages is not directly related to
hugepage migration, and this problem only affects the counter,
not other behaviors, so I'll separate hugepage counter fix patch
from this patch set and post as another patch series. Is this OK?

Thanks,
Naoya Horiguchi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ