lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210322193318.377c9ce9@alex-virtual-machine>
Date:   Mon, 22 Mar 2021 19:33:18 +0800
From:   Aili Yao <yaoaili@...gsoft.com>
To:     Matthew Wilcox <willy@...radead.org>,
        David Hildenbrand <david@...hat.com>,
        <akpm@...ux-foundation.org>, <naoya.horiguchi@....com>
CC:     <linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
        <yangfeng1@...gsoft.com>, <sunhao2@...gsoft.com>,
        Oscar Salvador <osalvador@...e.de>,
        Mike Kravetz <mike.kravetz@...cle.com>, <yaoaili@...gsoft.com>
Subject: [PATCH v5] mm/gup: check page hwposion status for coredump.

When we do coredump for user process signal, this may be one SIGBUS signal
with BUS_MCEERR_AR or BUS_MCEERR_AO code, which means this signal is
resulted from ECC memory fail like SRAR or SRAO, we expect the memory
recovery work is finished correctly, then the get_dump_page() will not
return the error page as its process pte is set invalid by
memory_failure().

But memory_failure() may fail, and the process's related pte may not be
correctly set invalid, for current code, we will return the poison page,
get it dumped, and then lead to system panic as its in kernel code.

So check the hwpoison status in get_dump_page(), and if TRUE, return NULL.

There maybe other scenario that is also better to check hwposion status
and not to panic, so make a wrapper for this check, Thanks to David's
suggestion(<david@...hat.com>).

Link: https://lkml.kernel.org/r/20210319104437.6f30e80d@alex-virtual-machine
Signed-off-by: Aili Yao <yaoaili@...gsoft.com>
Cc: David Hildenbrand <david@...hat.com>
Cc: Matthew Wilcox <willy@...radead.org>
Cc: Naoya Horiguchi <naoya.horiguchi@....com>
Cc: Oscar Salvador <osalvador@...e.de>
Cc: Mike Kravetz <mike.kravetz@...cle.com>
Cc: Aili Yao <yaoaili@...gsoft.com>
Cc: stable@...r.kernel.org
Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
---
 mm/gup.c      |  4 ++++
 mm/internal.h | 20 ++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/mm/gup.c b/mm/gup.c
index e4c224c..6f7e1aa 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1536,6 +1536,10 @@ struct page *get_dump_page(unsigned long addr)
 				      FOLL_FORCE | FOLL_DUMP | FOLL_GET);
 	if (locked)
 		mmap_read_unlock(mm);
+
+	if (ret == 1 && is_page_hwpoison(page))
+		return NULL;
+
 	return (ret == 1) ? page : NULL;
 }
 #endif /* CONFIG_ELF_CORE */
diff --git a/mm/internal.h b/mm/internal.h
index 25d2b2439..b751cef 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -97,6 +97,26 @@ static inline void set_page_refcounted(struct page *page)
 	set_page_count(page, 1);
 }
 
+/*
+ * When kernel touch the user page, the user page may be have been marked
+ * poison but still mapped in user space, if without this page, the kernel
+ * can guarantee the data integrity and operation success, the kernel is
+ * better to check the posion status and avoid touching it, be good not to
+ * panic, coredump for process fatal signal is a sample case matching this
+ * scenario. Or if kernel can't guarantee the data integrity, it's better
+ * not to call this function, let kernel touch the poison page and get to
+ * panic.
+ */
+static inline bool is_page_hwpoison(struct page *page)
+{
+	if (PageHWPoison(page))
+		return true;
+	else if (PageHuge(page) && PageHWPoison(compound_head(page)))
+		return true;
+
+	return false;
+}
+
 extern unsigned long highest_memmap_pfn;
 
 /*
-- 
1.8.3.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ