lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 13 Aug 2015 10:04:07 +0000
From:	Naoya Horiguchi <n-horiguchi@...jp.nec.com>
To:	Wanpeng Li <wanpeng.li@...mail.com>
CC:	Andrew Morton <akpm@...ux-foundation.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm/hwpoison: fix race between soft_offline_page and
 unpoison_memory

On Thu, Aug 13, 2015 at 05:18:56PM +0800, Wanpeng Li wrote:
> On 8/13/15 4:53 PM, Naoya Horiguchi wrote:
...
> >
> > I think that unpoison is used only in testing so this race never affects
> > our end-users/customers, so going back to this migratetype change stuff
> > looks unworthy to me.
>
> Migratetype stuff is just removed by you two months ago, then this bug
> occurs recently since the more and more patches which you fix some races.

Yes, this race (which existed before my recent changes) became more visible
with that changes. But I don't think that simply reverting them is a right solution.

> >
> > If I read correctly, the old migratetype approach has a few problems:
> >   1) it doesn't fix the problem completely, because
> >      set_migratetype_isolate() can fail to set MIGRATE_ISOLATE to the
> >      target page if the pageblock of the page contains one or more
> >      unmovable pages (i.e. has_unmovable_pages() returns true).
> >   2) the original code changes migratetype to MIGRATE_ISOLATE forcibly,
> >      and sets it to MIGRATE_MOVABLE forcibly after soft offline, regardless
> >      of the original migratetype state, which could impact other subsystems
> >      like memory hotplug or compaction.
>
> Maybe we can add a "FIXME" comment on the Migratetype stuff, since the
> current linus tree calltrace and it should be fixed immediately, and I
> don't see obvious bugs appear on migratetype stuffs at least currently,
> so "FIXME" is enough. :-)

Sorry if confusing, but my intention in saying about "FIXME" comment was
that we can find another solution for this race rather than just reverting,
so adding comment about the reported bug in current code (keeping code from
4491f712606) is OK for very short term.
I understand that leaving a race window of BUG_ON is not the best thing, but
as I said, this race shouldn't affect end-users, so this is not an urgent bug.
# It's enough if testers know this.

> >
> > So in my opinion, the best option is to fix this within unpoison code,
> > but it might be hard because we don't know whether the target page is
>
> Exactly.
>
> > under migration. The second option is to add a race check in the if (reason
> > == MR_MEMORY_FAILURE) branch in unmap_and_move(), although this looks ad-hoc
>
> You have already add MR_MEMORY_FAILURE, however, that is not enough to
> fix unpoison race.

Right, some additional code is necessary. I'll try the second approach.

Thanks,
Naoya Horiguchi

> > and need testing. The third option is to leave it with some "FIXME" comment.
>
> I *prefer* add "FIXME" to migratetype stuffs.
>
> Regards,
> Wanpeng Li
>
> >
> > Thanks,
> > Naoya Horiguchi
> >
> >> ---
> >>  include/linux/page-isolation.h |    5 +++++
> >>  mm/memory-failure.c            |   16 ++++++++++++----
> >>  mm/migrate.c                   |    3 +--
> >>  mm/page_isolation.c            |    4 ++--
> >>  4 files changed, 20 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> >> index 047d647..ff5751e 100644
> >> --- a/include/linux/page-isolation.h
> >> +++ b/include/linux/page-isolation.h
> >> @@ -65,6 +65,11 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
> >>  int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
> >>  			bool skip_hwpoisoned_pages);
> >>
> >> +/*
> >> + *  Internal functions. Changes pageblock's migrate type.
> >> + */
> >> +int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages);
> >> +void unset_migratetype_isolate(struct page *page, unsigned migratetype);
> >>  struct page *alloc_migrate_target(struct page *page, unsigned long private,
> >>  				int **resultp);
> >>
> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> >> index eca613e..0ed3814 100644
> >> --- a/mm/memory-failure.c
> >> +++ b/mm/memory-failure.c
> >> @@ -1647,8 +1647,6 @@ static int __soft_offline_page(struct page *page, int flags)
> >>  		inc_zone_page_state(page, NR_ISOLATED_ANON +
> >>  					page_is_file_cache(page));
> >>  		list_add(&page->lru, &pagelist);
> >> -		if (!TestSetPageHWPoison(page))
> >> -			atomic_long_inc(&num_poisoned_pages);
> >>  		ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
> >>  					MIGRATE_SYNC, MR_MEMORY_FAILURE);
> >>  		if (ret) {
> >> @@ -1663,8 +1661,9 @@ static int __soft_offline_page(struct page *page, int flags)
> >>  				pfn, ret, page->flags);
> >>  			if (ret > 0)
> >>  				ret = -EIO;
> >> -			if (TestClearPageHWPoison(page))
> >> -				atomic_long_dec(&num_poisoned_pages);
> >> +		} else {
> >> +			if (!TestSetPageHWPoison(page))
> >> +				atomic_long_inc(&num_poisoned_pages);
> >>  		}
> >>  	} else {
> >>  		pr_info("soft offline: %#lx: isolation failed: %d, page count %d, type %lx\n",
> >> @@ -1715,6 +1714,14 @@ int soft_offline_page(struct page *page, int flags)
> >>
> >>  	get_online_mems();
> >>
> >> +	/*
> >> +	 * Isolate the page, so that it doesn't get reallocated if it
> >> +	 * was free. This flag should be kept set until the source page
> >> +	 * is freed and PG_hwpoison on it is set.
> >> +	 */
> >> +	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
> >> +		set_migratetype_isolate(page, false);
> >> +
> >>  	ret = get_any_page(page, pfn, flags);
> >>  	put_online_mems();
> >>  	if (ret > 0) { /* for in-use pages */
> >> @@ -1733,5 +1740,6 @@ int soft_offline_page(struct page *page, int flags)
> >>  				atomic_long_inc(&num_poisoned_pages);
> >>  		}
> >>  	}
> >> +	unset_migratetype_isolate(page, MIGRATE_MOVABLE);
> >>  	return ret;
> >>  }
> >> diff --git a/mm/migrate.c b/mm/migrate.c
> >> index 1f8369d..472baf5 100644
> >> --- a/mm/migrate.c
> >> +++ b/mm/migrate.c
> >> @@ -880,8 +880,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
> >>  	/* Establish migration ptes or remove ptes */
> >>  	if (page_mapped(page)) {
> >>  		try_to_unmap(page,
> >> -			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS|
> >> -			TTU_IGNORE_HWPOISON);
> >> +			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
> >>  		page_was_mapped = 1;
> >>  	}
> >>
> >> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> >> index 4568fd5..654662a 100644
> >> --- a/mm/page_isolation.c
> >> +++ b/mm/page_isolation.c
> >> @@ -9,7 +9,7 @@
> >>  #include <linux/hugetlb.h>
> >>  #include "internal.h"
> >>
> >> -static int set_migratetype_isolate(struct page *page,
> >> +int set_migratetype_isolate(struct page *page,
> >>  				bool skip_hwpoisoned_pages)
> >>  {
> >>  	struct zone *zone;
> >> @@ -73,7 +73,7 @@ out:
> >>  	return ret;
> >>  }
> >>
> >> -static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
> >> +void unset_migratetype_isolate(struct page *page, unsigned migratetype)
> >>  {
> >>  	struct zone *zone;
> >>  	unsigned long flags, nr_pages;
> >> --
> >> 1.7.1
>--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ