[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <584eedd3-9369-9df1-39e2-62e331abdcc0@bytedance.com>
Date: Sun, 5 Jun 2022 12:24:24 +0800
From: zhenwei pi <pizhenwei@...edance.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: naoya.horiguchi@....com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Tony Luck <tony.luck@...el.com>,
Wu Fengguang <fengguang.wu@...el.com>
Subject: Re: Re: [PATCH] mm/memory-failure: don't allow to unpoison hw
corrupted page
On 6/5/22 02:56, Andrew Morton wrote:
> On Sat, 4 Jun 2022 18:32:29 +0800 zhenwei pi <pizhenwei@...edance.com> wrote:
>
>> Currently unpoison_memory(unsigned long pfn) is designed for soft
>> poison(hwpoison-inject) only. Unpoisoning a hardware corrupted page
>> puts page back buddy only, this leads BUG during accessing on the
>> corrupted KPTE.
>>
>> Do not allow to unpoison hardware corrupted page in unpoison_memory()
>> to avoid BUG like this:
>>
>> Unpoison: Software-unpoisoned page 0x61234
>> BUG: unable to handle page fault for address: ffff888061234000
>
> Thanks.
>
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -2090,6 +2090,7 @@ int unpoison_memory(unsigned long pfn)
>> {
>> struct page *page;
>> struct page *p;
>> + pte_t *kpte;
>> int ret = -EBUSY;
>> int freeit = 0;
>> static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
>> @@ -2101,6 +2102,13 @@ int unpoison_memory(unsigned long pfn)
>> p = pfn_to_page(pfn);
>> page = compound_head(p);
>>
>> + kpte = virt_to_kpte((unsigned long)page_to_virt(p));
>> + if (kpte && !pte_present(*kpte)) {
>> + unpoison_pr_info("Unpoison: Page was hardware poisoned %#lx\n",
>> + pfn, &unpoison_rs);
>> + return -EPERM;
>> + }
>> +
>> mutex_lock(&mf_mutex);
>>
>> if (!PageHWPoison(p)) {
>
> I guess we don't want to let fault injection crash the kernel, so a
> cc:stable seems appropriate here.
>
> Can we think up a suitable Fixes: commit? I'm suspecting this bug has
> been there for a long time?
>
Sure!
2009-Dec-16, hwpoison_unpoison() was introduced into linux in commit:
847ce401df392("HWPOISON: Add unpoisoning support")
...
There is no hardware level unpoisioning, so this cannot be used for real
memory errors, only for software injected errors.
...
We can find that this function should be used for software level
unpoisoning only in both commit log and comment in source code.
unfortunately there is no check in function hwpoison_unpoison().
2020-May-20, 17fae1294ad9d("x86/{mce,mm}: Unmap the entire page if the
whole page is affected and poisoned")
This clears KPTE, and leads BUG(described in this patch) during
unpoisoning the hardware corrupted page.
Fixes: 847ce401df392("HWPOISON: Add unpoisoning support")
Fixes: 17fae1294ad9d("x86/{mce,mm}: Unmap the entire page if the whole
page is affected and poisoned")
Cc: Wu Fengguang <fengguang.wu@...el.com>
Cc: Tony Luck <tony.luck@...el.com>.
--
zhenwei pi
Powered by blists - more mailing lists