lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 16 Mar 2021 06:42:16 +0000
From:   HORIGUCHI NAOYA(堀口 直也) 
        <naoya.horiguchi@....com>
To:     "Luck, Tony" <tony.luck@...el.com>
CC:     Aili Yao <yaoaili@...gsoft.com>,
        Oscar Salvador <osalvador@...e.de>,
        "david@...hat.com" <david@...hat.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "bp@...en8.de" <bp@...en8.de>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "hpa@...or.com" <hpa@...or.com>, "x86@...nel.org" <x86@...nel.org>,
        "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "yangfeng1@...gsoft.com" <yangfeng1@...gsoft.com>,
        "sunhao2@...gsoft.com" <sunhao2@...gsoft.com>
Subject: Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned

On Fri, Mar 12, 2021 at 11:48:31PM +0000, Luck, Tony wrote:
> >> will memory_failure() find it and unmap it? if succeed, then the current will be
> >> signaled with correct vaddr and shift?
> >
> > That's a very good question.  I didn't see a SIGBUS when I first wrote this code,
> > hence all the p->mce_vaddr.  But now I'm
> > a) not sure why there wasn't a signal

We have a recent change around this behavior, which might be an answer for you.
See commit 30c9cf492704 ("mm,hwpoison: send SIGBUS to PF_MCE_EARLY processes on action required events").

> > b) if we are to fix the problems noted by AndyL, need to make sure that there isn't a SIGBUS
> 
> Tests on upstream kernel today show that memory_failure() is both unmapping the page
> and sending a SIGBUS.

Sorry if I misunderstood the exact problem, but if the point is to allow
user processes to find poisoned virtual address without SIGBUS, one possible
way is to expose hwpoison entry via /proc/pid/pagemap (attached a draft patch below).
With this patch, processes easily find hwpoison entries without any new interface.

> 
> My biggest issue with the KERNEL_COPYIN recovery path is that we don't have code
> to mark the page not present while we are still in do_machine_check(). 

It sounds to me that even if we have such code, it just narrows down the race window
between multiple MCEs on different CPUs. Or does it completely prevent the race?
(Another thread could touch the poisoned page just before the first thread marks
the page non-present?)

> That's resulted
> in recovery working for simple cases where there is a single get_user() call followed by
> an error return if that failed. But more complex cases require more machine checks and
> a touching faith that the kernel will eventually give up trying (spoiler: it sometimes doesn't).
> 
> Thanks to the decode of the instruction we do have the virtual address. So we just need
> a safe walk of pgd->p4d->pud->pmd->pte (truncated if we hit a huge page) with a write
> of a "not-present" value. Maybe a different poison type from the one we get from
> memory_failure() so that the #PF code can recognize this as a special case and do any
> other work that we avoided because we were in #MC context.

As you answered in another email, p->mce_vaddr is set only on KERNEL_COPYIN case,
then if p->mce_vaddr is available also for generic SRAR MCE (I'm assuming that we are
talking about issues on race between generic SRAR MCE not only for KERNEL_COPYIN case),
that might be helpful, although it might be hard to implement.
And I'm afraid that walking page table could find the wrong virtual address if a process
have multiple entries to the single page. We could send multiple SIGBUSs for such case,
but maybe that's not an optimal solution.

Thanks,
Naoya Horiguchi

----
>From 147449a97e2ea3420ac3523f13523f5d30a13614 Mon Sep 17 00:00:00 2001
From: Naoya Horiguchi <naoya.horiguchi@....com>
Date: Tue, 16 Mar 2021 14:22:21 +0900
Subject: [PATCH] pagemap: expose hwpoison entry

not-signed-off-by-yet: Naoya Horiguchi <naoya.horiguchi@....com>
---
 fs/proc/task_mmu.c      |  6 ++++++
 include/linux/swapops.h | 12 ++++++++++++
 tools/vm/page-types.c   |  5 ++++-
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 602e3a52884d..08cea209bae7 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1300,6 +1300,7 @@ struct pagemapread {
 #define PM_PFRAME_MASK		GENMASK_ULL(PM_PFRAME_BITS - 1, 0)
 #define PM_SOFT_DIRTY		BIT_ULL(55)
 #define PM_MMAP_EXCLUSIVE	BIT_ULL(56)
+#define PM_HWPOISON	       	BIT_ULL(60)
 #define PM_FILE			BIT_ULL(61)
 #define PM_SWAP			BIT_ULL(62)
 #define PM_PRESENT		BIT_ULL(63)
@@ -1385,6 +1386,11 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 		if (is_migration_entry(entry))
 			page = migration_entry_to_page(entry);
 
+		if (is_hwpoison_entry(entry)) {
+			page = hwpoison_entry_to_page(entry);
+			flags |= PM_HWPOISON;
+		}
+
 		if (is_device_private_entry(entry))
 			page = device_private_entry_to_page(entry);
 	}
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index d9b7c9132c2f..1b9dedbd06ab 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -323,6 +323,13 @@ static inline int is_hwpoison_entry(swp_entry_t entry)
 	return swp_type(entry) == SWP_HWPOISON;
 }
 
+static inline struct page *hwpoison_entry_to_page(swp_entry_t entry)
+{
+	struct page *p = pfn_to_page(swp_offset(entry));
+	WARN_ON(!PageHWPoison(p));
+	return p;
+}
+
 static inline void num_poisoned_pages_inc(void)
 {
 	atomic_long_inc(&num_poisoned_pages);
@@ -345,6 +352,11 @@ static inline int is_hwpoison_entry(swp_entry_t swp)
 	return 0;
 }
 
+static inline struct page *hwpoison_entry_to_page(swp_entry_t entry)
+{
+	return NULL;
+}
+
 static inline void num_poisoned_pages_inc(void)
 {
 }
diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
index 0517c744b04e..1160d5a14955 100644
--- a/tools/vm/page-types.c
+++ b/tools/vm/page-types.c
@@ -53,6 +53,7 @@
 #define PM_SWAP_OFFSET(x)	(((x) & PM_PFRAME_MASK) >> MAX_SWAPFILES_SHIFT)
 #define PM_SOFT_DIRTY		(1ULL << 55)
 #define PM_MMAP_EXCLUSIVE	(1ULL << 56)
+#define PM_HWPOISON		(1ULL << 60)
 #define PM_FILE			(1ULL << 61)
 #define PM_SWAP			(1ULL << 62)
 #define PM_PRESENT		(1ULL << 63)
@@ -311,6 +312,8 @@ static unsigned long pagemap_pfn(uint64_t val)
 
 	if (val & PM_PRESENT)
 		pfn = PM_PFRAME(val);
+	else if (val & PM_HWPOISON)
+		pfn = PM_SWAP_OFFSET(val);
 	else
 		pfn = 0;
 
@@ -742,7 +745,7 @@ static void walk_vma(unsigned long index, unsigned long count)
 			pfn = pagemap_pfn(buf[i]);
 			if (pfn)
 				walk_pfn(index + i, pfn, 1, buf[i]);
-			if (buf[i] & PM_SWAP)
+			else if (buf[i] & PM_SWAP)
 				walk_swap(index + i, buf[i]);
 		}
 
-- 
2.25.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ