linux-kernel - Re: [PATCH v1 5/6] mm/hwpoison: make some kernel pages handlable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <271d0f41-0599-9d5d-0555-47189f476243@sangfor.com.cn>
Date:   Wed, 28 Jul 2021 18:59:37 +0800
From:   Ding Hui <dinghui@...gfor.com.cn>
To:     Naoya Horiguchi <nao.horiguchi@...il.com>, mike.kravetz@...cle.com
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        David Hildenbrand <david@...hat.com>,
        Oscar Salvador <osalvador@...e.de>,
        Michal Hocko <mhocko@...e.com>,
        Tony Luck <tony.luck@...el.com>,
        "Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
        Naoya Horiguchi <naoya.horiguchi@....com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        huangcun@...gfor.com.cn
Subject: Re: [PATCH v1 5/6] mm/hwpoison: make some kernel pages handlable

On 2021/6/14 10:12, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@....com>
> 
> HWPoisonHandlable() introduced by patch "mm,hwpoison: fix race with hugetlb
> page allocation" filters error events by page type, and only limited events
> reach get_page_unless_zero() to avoid race >

I want to report a bug which has relationship with "mm,hwpoison: fix 
race with hugetlb page allocation", hugetlb pmd shared and also this patch.

Recently, when test hugetlb and soft offline, I encountered a crash like 
this:
[449901.638605] huge_test[16596]: segfault at 8 ip 00007f5f64c39a12 sp 
00007fff2105c020 error 4 in ld-2.23.so[7f5f64c2a000+26000]
[449901.638612] Code: 48 8d 35 2c 03 01 00 48 8d 3d 31 03 01 00 ba b5 00 
00 00 e8 f0 a5 00 00 53 49 89 fa 89 f6 48 8d 14 76 48 83 ec 10 48 8b 47 
68 <48> 8b 78 08 49 8b 82 f8 00 00 00 48 8b 40 08 4c 8d 04 d0 49 8b 42
[449901.638885] BUG: Bad rss-counter state mm:00000000a1ce68ac idx:0 val:358
[449901.638894] ------------[ cut here ]------------
[449901.638962] BUG: Bad rss-counter state mm:00000000a1ce68ac idx:1 val:26
[449901.638966] BUG: non-zero pgtables_bytes on freeing mm: 28672
[449901.639045] kernel BUG at fs/hugetlbfs/inode.c:443!
[449901.639193] invalid opcode: 0000 [#1] SMP NOPTI

After a few days of digging and reproduce, it turns out that there is a 
mechanism conflict between the get_hwpoison_page() and hugetlb pmd share:

In huge_pmd_unshare(), the page_count is used to determine whether the 
page is shared, it is not safe.

My case is the same page's refcount was increaseed by 
get_hwpoison_page() little before if (page_count(virt_to_page(ptep)) == 
1) in huge_pmd_unshare(), so huge_pmd_unshare() went to wrong branch.

> Actually this is too restictive because get_hwpoison_page always fails
> to take refcount for any types of kernel page, leading to
> MF_MSG_KERNEL_HIGH_ORDER.  This is not critical (no panic), but less
> informative than MF_MSG_SLAB or MF_MSG_PAGETABLE, so extend
> HWPoisonHandlable() to some basic types of kernel pages (slab, pgtable,
> and reserved pages).
> 

After "mm,hwpoison: fix race with hugetlb page allocation"，the 
PageTable(page) is blocked to get_page_unless_zero() due to 
"restictive", this bug is just fixed by side effect.

> The "handling" for these types are still primitive (just taking refcount
> and setting PG_hwpoison) and some more aggressive actions for memory
> error containment are possible and wanted.  But compared to the older code,
> these cases never enter the code block of page locks (note that
> page locks is not well-defined on these pages), so it's a little safer
> for functions intended for user pages not to be called for kernel pages.
> 

But the root cause is still existed, the bug may come back at any time 
by unconsciously, like this patch, if the PageTable(page) is allowed to 
get_page_unless_zero(), the risk is come back.

I'm not sure is there any other way to determine whether the pmd page is 
shared, so I add Mike Kravetz here, and report the risk to you.

> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@....com>
> ---
>   mm/memory-failure.c | 28 ++++++++++++++++++++--------
>   1 file changed, 20 insertions(+), 8 deletions(-)
> 
> diff --git v5.13-rc5/mm/memory-failure.c v5.13-rc5_patched/mm/memory-failure.c
> index b986936e50eb..0d51067f0129 100644
> --- v5.13-rc5/mm/memory-failure.c
> +++ v5.13-rc5_patched/mm/memory-failure.c
> @@ -1113,7 +1113,8 @@ static int page_action(struct page_state *ps, struct page *p,
>    */
>   static inline bool HWPoisonHandlable(struct page *page)
>   {
> -	return PageLRU(page) || __PageMovable(page);
> +	return PageLRU(page) || __PageMovable(page) ||
> +		PageSlab(page) || PageTable(page) || PageReserved(page);
>   }
>    >   static int __get_hwpoison_page(struct page *page)
> @@ -1260,12 +1261,6 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
>   	struct page *hpage = *hpagep;
>   	bool mlocked = PageMlocked(hpage);
>   
> -	/*
> -	 * Here we are interested only in user-mapped pages, so skip any
> -	 * other types of pages.
> -	 */
> -	if (PageReserved(p) || PageSlab(p))
> -		return true;
>   	if (!(PageLRU(hpage) || PageHuge(p)))
>   		return true;
>   
> @@ -1670,7 +1665,10 @@ int memory_failure(unsigned long pfn, int flags)
>   				action_result(pfn, MF_MSG_BUDDY, res);
>   				res = res == MF_RECOVERED ? 0 : -EBUSY;
>   			} else {
> -				action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
> +				if (PageCompound(p))
> +					action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
> +				else
> +					action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
>   				res = -EBUSY;
>   			}
>   			goto unlock_mutex;
> @@ -1681,6 +1679,20 @@ int memory_failure(unsigned long pfn, int flags)
>   		}
>   	}
>   
> +	if (PageSlab(p)) {
> +		action_result(pfn, MF_MSG_SLAB, MF_IGNORED);
> +		res = -EBUSY;
> +		goto unlock_mutex;
> +	} else if (PageTable(p)) {
> +		action_result(pfn, MF_MSG_PAGETABLE, MF_IGNORED);
> +		res = -EBUSY;
> +		goto unlock_mutex;
> +	} else if (PageReserved(p)) {
> +		action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
> +		res = -EBUSY;
> +		goto unlock_mutex;
> +	}
> +
>   	if (PageTransHuge(hpage)) {
>   		if (try_to_split_thp_page(p, "Memory Failure") < 0) {
>   			action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
> 


-- 
Thanks,
- Ding Hui