[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <271d0f41-0599-9d5d-0555-47189f476243@sangfor.com.cn>
Date: Wed, 28 Jul 2021 18:59:37 +0800
From: Ding Hui <dinghui@...gfor.com.cn>
To: Naoya Horiguchi <nao.horiguchi@...il.com>, mike.kravetz@...cle.com
Cc: Andrew Morton <akpm@...ux-foundation.org>,
David Hildenbrand <david@...hat.com>,
Oscar Salvador <osalvador@...e.de>,
Michal Hocko <mhocko@...e.com>,
Tony Luck <tony.luck@...el.com>,
"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
Naoya Horiguchi <naoya.horiguchi@....com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
huangcun@...gfor.com.cn
Subject: Re: [PATCH v1 5/6] mm/hwpoison: make some kernel pages handlable
On 2021/6/14 10:12, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@....com>
>
> HWPoisonHandlable() introduced by patch "mm,hwpoison: fix race with hugetlb
> page allocation" filters error events by page type, and only limited events
> reach get_page_unless_zero() to avoid race >
I want to report a bug which has relationship with "mm,hwpoison: fix
race with hugetlb page allocation", hugetlb pmd shared and also this patch.
Recently, when test hugetlb and soft offline, I encountered a crash like
this:
[449901.638605] huge_test[16596]: segfault at 8 ip 00007f5f64c39a12 sp
00007fff2105c020 error 4 in ld-2.23.so[7f5f64c2a000+26000]
[449901.638612] Code: 48 8d 35 2c 03 01 00 48 8d 3d 31 03 01 00 ba b5 00
00 00 e8 f0 a5 00 00 53 49 89 fa 89 f6 48 8d 14 76 48 83 ec 10 48 8b 47
68 <48> 8b 78 08 49 8b 82 f8 00 00 00 48 8b 40 08 4c 8d 04 d0 49 8b 42
[449901.638885] BUG: Bad rss-counter state mm:00000000a1ce68ac idx:0 val:358
[449901.638894] ------------[ cut here ]------------
[449901.638962] BUG: Bad rss-counter state mm:00000000a1ce68ac idx:1 val:26
[449901.638966] BUG: non-zero pgtables_bytes on freeing mm: 28672
[449901.639045] kernel BUG at fs/hugetlbfs/inode.c:443!
[449901.639193] invalid opcode: 0000 [#1] SMP NOPTI
After a few days of digging and reproduce, it turns out that there is a
mechanism conflict between the get_hwpoison_page() and hugetlb pmd share:
In huge_pmd_unshare(), the page_count is used to determine whether the
page is shared, it is not safe.
My case is the same page's refcount was increaseed by
get_hwpoison_page() little before if (page_count(virt_to_page(ptep)) ==
1) in huge_pmd_unshare(), so huge_pmd_unshare() went to wrong branch.
> Actually this is too restictive because get_hwpoison_page always fails
> to take refcount for any types of kernel page, leading to
> MF_MSG_KERNEL_HIGH_ORDER. This is not critical (no panic), but less
> informative than MF_MSG_SLAB or MF_MSG_PAGETABLE, so extend
> HWPoisonHandlable() to some basic types of kernel pages (slab, pgtable,
> and reserved pages).
>
After "mm,hwpoison: fix race with hugetlb page allocation",the
PageTable(page) is blocked to get_page_unless_zero() due to
"restictive", this bug is just fixed by side effect.
> The "handling" for these types are still primitive (just taking refcount
> and setting PG_hwpoison) and some more aggressive actions for memory
> error containment are possible and wanted. But compared to the older code,
> these cases never enter the code block of page locks (note that
> page locks is not well-defined on these pages), so it's a little safer
> for functions intended for user pages not to be called for kernel pages.
>
But the root cause is still existed, the bug may come back at any time
by unconsciously, like this patch, if the PageTable(page) is allowed to
get_page_unless_zero(), the risk is come back.
I'm not sure is there any other way to determine whether the pmd page is
shared, so I add Mike Kravetz here, and report the risk to you.
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@....com>
> ---
> mm/memory-failure.c | 28 ++++++++++++++++++++--------
> 1 file changed, 20 insertions(+), 8 deletions(-)
>
> diff --git v5.13-rc5/mm/memory-failure.c v5.13-rc5_patched/mm/memory-failure.c
> index b986936e50eb..0d51067f0129 100644
> --- v5.13-rc5/mm/memory-failure.c
> +++ v5.13-rc5_patched/mm/memory-failure.c
> @@ -1113,7 +1113,8 @@ static int page_action(struct page_state *ps, struct page *p,
> */
> static inline bool HWPoisonHandlable(struct page *page)
> {
> - return PageLRU(page) || __PageMovable(page);
> + return PageLRU(page) || __PageMovable(page) ||
> + PageSlab(page) || PageTable(page) || PageReserved(page);
> }
> > static int __get_hwpoison_page(struct page *page)
> @@ -1260,12 +1261,6 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
> struct page *hpage = *hpagep;
> bool mlocked = PageMlocked(hpage);
>
> - /*
> - * Here we are interested only in user-mapped pages, so skip any
> - * other types of pages.
> - */
> - if (PageReserved(p) || PageSlab(p))
> - return true;
> if (!(PageLRU(hpage) || PageHuge(p)))
> return true;
>
> @@ -1670,7 +1665,10 @@ int memory_failure(unsigned long pfn, int flags)
> action_result(pfn, MF_MSG_BUDDY, res);
> res = res == MF_RECOVERED ? 0 : -EBUSY;
> } else {
> - action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
> + if (PageCompound(p))
> + action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
> + else
> + action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
> res = -EBUSY;
> }
> goto unlock_mutex;
> @@ -1681,6 +1679,20 @@ int memory_failure(unsigned long pfn, int flags)
> }
> }
>
> + if (PageSlab(p)) {
> + action_result(pfn, MF_MSG_SLAB, MF_IGNORED);
> + res = -EBUSY;
> + goto unlock_mutex;
> + } else if (PageTable(p)) {
> + action_result(pfn, MF_MSG_PAGETABLE, MF_IGNORED);
> + res = -EBUSY;
> + goto unlock_mutex;
> + } else if (PageReserved(p)) {
> + action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
> + res = -EBUSY;
> + goto unlock_mutex;
> + }
> +
> if (PageTransHuge(hpage)) {
> if (try_to_split_thp_page(p, "Memory Failure") < 0) {
> action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
>
--
Thanks,
- Ding Hui
Powered by blists - more mailing lists