[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20140702000245.GM5714@two.firstfloor.org>
Date: Wed, 2 Jul 2014 02:02:45 +0200
From: Andi Kleen <andi@...stfloor.org>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Andi Kleen <andi@...stfloor.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Andi Kleen <ak@...ux.intel.com>,
Naoya Horiguchi <n-horiguchi@...jp.nec.com>
Subject: Re: [PATCH] hwpoison: Fix race with changing page during offlining
v2
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -1168,6 +1168,16 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
> > lock_page(hpage);
> >
> > /*
> > + * The page could have changed compound pages during the locking.
> > + * If this happens just bail out.
> > + */
> > + if (compound_head(p) != hpage) {
>
> How can a 4k page change compound pages? The original compound page
> was torn down and then this 4k page became part of a differently-size
> compound page?
Yes or it was torn down and now it's its own page.
>
> > + action_result(pfn, "different compound page after locking", IGNORED);
> > + res = -EBUSY;
> > + goto out;
> > + }
> > +
> > + /*
>
> I don't get it. We just go and fail the poisoning attempt? Shouldn't
> we go back, grab the new hpage and try again?
It should be quite rare, so I thought this was safest. An retry loop
would be more difficult to test and may have more side effects.
The hwpoison code by design only tries to handle cases that are
reasonably common in workloads, as visible in page-flags.
I'm not really that concerned about handling this (likely rare case),
just not crashing on it.
-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists