lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220408015610.GA3061012@hori.linux.bs1.fc.nec.co.jp>
Date:   Fri, 8 Apr 2022 01:56:12 +0000
From:   HORIGUCHI NAOYA(堀口 直也) 
        <naoya.horiguchi@....com>
To:     Miaohe Lin <linmiaohe@...wei.com>
CC:     Naoya Horiguchi <naoya.horiguchi@...ux.dev>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Yang Shi <shy828301@...il.com>,
        Dan Carpenter <dan.carpenter@...cle.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>
Subject: Re: [PATCH v7] mm/hwpoison: fix race between hugetlb free/demotion
 and memory_failure_hugetlb()

On Thu, Apr 07, 2022 at 09:38:26PM +0800, Miaohe Lin wrote:
> On 2022/4/7 19:29, Naoya Horiguchi wrote:
...
> > +int __get_huge_page_for_hwpoison(unsigned long pfn, int flags)
> > +{
> > +	struct page *page = pfn_to_page(pfn);
> > +	struct page *head = compound_head(page);
> > +	int ret = 2;	/* fallback to normal page handling */
> > +	bool count_increased = false;
> > +
> > +	if (!PageHeadHuge(head))
> > +		goto out;
> > +
> > +	if (flags & MF_COUNT_INCREASED) {
> > +		ret = 1;
> > +		count_increased = true;
> > +	} else if (HPageFreed(head) || HPageMigratable(head)) {
> > +		ret = get_page_unless_zero(head);
> > +		if (ret)
> > +			count_increased = true;
> > +	} else {
> > +		ret = -EBUSY;
> > +		goto out;
> > +	}
> > +
> > +	if (hwpoison_filter(page)) {
> > +		ret = -EOPNOTSUPP;
> > +		goto out;
> > +	}
> 
> Now hwpoison_filter is done without lock_page + unlock_page. Is this ok or
> lock_page + unlock_page pair is indeed required?

Hmm, we had better call hwpoison_filter in page lock for hugepages.
I'll move this too, thank you.

> > +
> > +	if (TestSetPageHWPoison(head)) {
> > +		ret = -EHWPOISON;
> > +		goto out;
> > +	}
> 
> Without this patch, page refcnt is not decremented if MF_COUNT_INCREASED is set in flags
> when PageHWPoison is already set. So I think this patch also fixes that issue. Thanks!

Good point, I even didn't notice that. And the issue still seems to exist
for normal page's cases.  Maybe encountering "already hwpoisoned" case from
madvise_inject_error() is rare but could happen when the first call failed
to contain the error (which is still accessible from the calling process).

> 
> > +
> > +	return ret;
> > +out:
> > +	if (count_increased)
> > +		put_page(head);
> > +	return ret;
> > +}
> > +
> > +#ifdef CONFIG_HUGETLB_PAGE
> > +/*
> > + * Taking refcount of hugetlb pages needs extra care about race conditions
> > + * with basic operations like hugepage allocation/free/demotion.
> > + * So all necessary prechecks for hwpoison (like pinning, testing/setting
> > + * PageHWPoison, and hwpoison_filter) are done in single hugetlb_lock range.
> > + */
> > +static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb)
> >  {
> > -	struct page *p = pfn_to_page(pfn);
> > -	struct page *head = compound_head(p);
> >  	int res;
> > +	struct page *p = pfn_to_page(pfn);
> > +	struct page *head;
> >  	unsigned long page_flags;
> > +	bool retry = true;
> >  
> > -	if (TestSetPageHWPoison(head)) {
> > -		pr_err("Memory failure: %#lx: already hardware poisoned\n",
> > -		       pfn);
> > -		res = -EHWPOISON;
> > -		if (flags & MF_ACTION_REQUIRED)
> > +	*hugetlb = 1;
> > +retry:
> > +	res = get_huge_page_for_hwpoison(pfn, flags);
> > +	if (res == 2) { /* fallback to normal page handling */
> > +		*hugetlb = 0;
> > +		return 0;
> > +	} else if (res == -EOPNOTSUPP) {
> > +		return res;
> > +	} else if (res == -EHWPOISON) {
> > +		pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn);
> > +		if (flags & MF_ACTION_REQUIRED) {
> > +			head = compound_head(p);
> >  			res = kill_accessing_process(current, page_to_pfn(head), flags);
> > +		}
> > +		return res;
> > +	} else if (res == -EBUSY) {
> > +		if (retry) {
> > +			retry = false;
> > +			goto retry;
> > +		}
> > +		action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED);
> >  		return res;
> >  	}
> >  
> >  	num_poisoned_pages_inc();
> >  
> > -	if (!(flags & MF_COUNT_INCREASED)) {
> > -		res = get_hwpoison_page(p, flags);
> > -		if (!res) {
> > -			lock_page(head);
> > -			if (hwpoison_filter(p)) {
> > -				if (TestClearPageHWPoison(head))
> > -					num_poisoned_pages_dec();
> > -				unlock_page(head);
> > -				return -EOPNOTSUPP;
> > -			}
> > -			unlock_page(head);
> > -			res = MF_FAILED;
> > -			if (__page_handle_poison(p)) {
> > -				page_ref_inc(p);
> > -				res = MF_RECOVERED;
> > -			}
> > -			action_result(pfn, MF_MSG_FREE_HUGE, res);
> > -			return res == MF_RECOVERED ? 0 : -EBUSY;
> > -		} else if (res < 0) {
> > -			action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED);
> > -			return -EBUSY;
> > +	/*
> > +	 * Handling free hugepage.  The possible race with hugepage allocation
> > +	 * or demotion can be prevented by PageHWPoison flag.
> > +	 */
> > +	if (res == 0) {
> > +		res = MF_FAILED;
> > +		if (__page_handle_poison(p)) {
> > +			page_ref_inc(p);
> > +			res = MF_RECOVERED;
> >  		}
> > +		action_result(pfn, MF_MSG_FREE_HUGE, res);
> > +		return res == MF_RECOVERED ? 0 : -EBUSY;
> >  	}
> >  
> > +	head = compound_head(p);
> >  	lock_page(head);
> >  
> >  	/*
> 
> IMHO, the below code could be removed now as we fetch the refcnt under the hugetlb_lock:
> 
> 	/*
> 	 * The page could have changed compound pages due to race window.
> 	 * If this happens just bail out.
> 	 */
> 	if (!PageHuge(p) || compound_head(p) != head) {
> 		action_result(pfn, MF_MSG_DIFFERENT_PAGE_SIZE, MF_IGNORED);
> 		res = -EBUSY;
> 		goto out;
> 	}
> 
> But this might be another patch.

I'll do this.

Thank you for the review and suggestions,

- Naoya Horiguchi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ