[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <75179fb1-eb83-15b8-b7ba-d405745e1566@linux.vnet.ibm.com>
Date: Tue, 23 Jan 2018 19:15:35 +0100
From: Laurent Dufour <ldufour@...ux.vnet.ibm.com>
To: Andrew Morton <akpm@...ux-foundation.org>,
Michal Hocko <mhocko@...nel.org>
Cc: Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
Balbir Singh <bsingharora@...il.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
Wen Congyang <wency@...fujitsu.com>
Subject: Re: [PATCH v2 2/2] mm: skip HWPoisoned pages when onlining pages
Hi Andrew,
On 18/01/2018 00:03, Andrew Morton wrote:
> On Fri, 28 Apr 2017 08:30:48 +0200 Michal Hocko <mhocko@...nel.org> wrote:
>
>> On Wed 26-04-17 03:13:04, Naoya Horiguchi wrote:
>>> On Wed, Apr 26, 2017 at 12:10:15PM +1000, Balbir Singh wrote:
>>>> On Tue, 2017-04-25 at 16:27 +0200, Laurent Dufour wrote:
>>>>> The commit b023f46813cd ("memory-hotplug: skip HWPoisoned page when
>>>>> offlining pages") skip the HWPoisoned pages when offlining pages, but
>>>>> this should be skipped when onlining the pages too.
>>>>>
>>>>> Signed-off-by: Laurent Dufour <ldufour@...ux.vnet.ibm.com>
>>>>> ---
>>>>> mm/memory_hotplug.c | 4 ++++
>>>>> 1 file changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>>>> index 6fa7208bcd56..741ddb50e7d2 100644
>>>>> --- a/mm/memory_hotplug.c
>>>>> +++ b/mm/memory_hotplug.c
>>>>> @@ -942,6 +942,10 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
>>>>> if (PageReserved(pfn_to_page(start_pfn)))
>>>>> for (i = 0; i < nr_pages; i++) {
>>>>> page = pfn_to_page(start_pfn + i);
>>>>> + if (PageHWPoison(page)) {
>>>>> + ClearPageReserved(page);
>>>>
>>>> Why do we clear page reserved? Also if the page is marked PageHWPoison, it
>>>> was never offlined to begin with? Or do you expect this to be set on newly
>>>> hotplugged memory? Also don't we need to skip the entire pageblock?
>>>
>>> If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd means
>>> that we skip the page status check for hwpoisoned pages *not* to prevent
>>> memory offlining for memblocks with hwpoisoned pages. That means that
>>> hwpoisoned pages can be offlined.
>>
>> Is this patch actually correct? I am trying to wrap my head around it
>> but it smells like it tries to avoid the problem rather than fix it
>> properly. I might be wrong here of course but to me it sounds like
>> poisoned page should simply be offlined and keep its poison state all
>> the time. If the memory is hot-removed and added again we have lost the
>> struct page along with the state which is the expected behavior. If it
>> is still broken we will re-poison it.
>>
>> Anyway a patch to skip over poisoned pages during online makes perfect
>> sense to me. The PageReserved fiddling around much less so.
>>
>> Or am I missing something. Let's CC Wen Congyang for the clarification
>> here.
>
> Wen Congyang appears to have disappeared and this fix isn't yet
> finalized. Can we all please revisit it and have a think about
> Michal's questions?
I tried to recreate the original issue, but there were a lot of changes
done in this area since the last April.
I was not able to offline a poisoned page because isolate_movable_page() is
failing. I'll investigate that further...
Cheers,
Laurent.
> Thanks.
>
>
> From: Laurent Dufour <ldufour@...ux.vnet.ibm.com>
> Subject: mm: skip HWPoisoned pages when onlining pages
>
> b023f46813cd ("memory-hotplug: skip HWPoisoned page when offlining pages")
> skipped the HWPoisoned pages when offlining pages, but this should be
> skipped when onlining the pages too.
>
> n-horiguchi@...jp.nec.com said:
>
> : If I read correctly, to "skip HWPoiosned page" in commit b023f46813cd
> : means that we skip the page status check for hwpoisoned pages *not* to
> : prevent memory offlining for memblocks with hwpoisoned pages. That
> : means that hwpoisoned pages can be offlined.
> :
> : And another reason to clear PageReserved is that we could reuse the
> : hwpoisoned page after onlining back with replacing the broken DIMM. In
> : this usecase, we first do unpoisoning to clear PageHWPoison, but it
> : doesn't work if PageReserved is set. My simple testing shows the BUG
> : below in unpoisoning (without the ClearPageReserved):
> :
> : Unpoison: Software-unpoisoned page 0x18000
> : BUG: Bad page state in process page-types pfn:18000
> : page:ffffda5440600000 count:0 mapcount:0 mapping: (null) index:0x70006b599
> : flags: 0x1fffc00004081a(error|uptodate|dirty|reserved|swapbacked)
> : raw: 001fffc00004081a 0000000000000000 000000070006b599 00000000ffffffff
> : raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
> : page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> : bad because of flags: 0x800(reserved)
>
> Link: http://lkml.kernel.org/r/1493130472-22843-3-git-send-email-ldufour@linux.vnet.ibm.com
> Signed-off-by: Laurent Dufour <ldufour@...ux.vnet.ibm.com>
> Cc: Naoya Horiguchi <n-horiguchi@...jp.nec.com>
> Cc: Andrey Vagin <avagin@...nvz.org>
> Cc: Glauber Costa <glommer@...nvz.org>
> Cc: Vladimir Davydov <vdavydov.dev@...il.com>
> Cc: Balbir Singh <bsingharora@...il.com>
> Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
> ---
>
> mm/memory_hotplug.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff -puN mm/memory_hotplug.c~mm-skip-hwpoisoned-pages-when-onlining-pages mm/memory_hotplug.c
> --- a/mm/memory_hotplug.c~mm-skip-hwpoisoned-pages-when-onlining-pages
> +++ a/mm/memory_hotplug.c
> @@ -696,6 +696,10 @@ static int online_pages_range(unsigned l
> if (PageReserved(pfn_to_page(start_pfn)))
> for (i = 0; i < nr_pages; i++) {
> page = pfn_to_page(start_pfn + i);
> + if (PageHWPoison(page)) {
> + ClearPageReserved(page);
> + continue;
> + }
> (*online_page_callback)(page);
> onlined_pages++;
> }
> _
>
Powered by blists - more mailing lists