lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180119114917.rvghcgexgbm73xkq@node.shutemov.name>
Date:   Fri, 19 Jan 2018 14:49:17 +0300
From:   "Kirill A. Shutemov" <kirill@...temov.name>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Dave Hansen <dave.hansen@...ux.intel.com>,
        Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
        torvalds@...ux-foundation.org, kirill.shutemov@...ux.intel.com,
        akpm@...ux-foundation.org, hannes@...xchg.org,
        iamjoonsoo.kim@....com, mgorman@...hsingularity.net,
        tony.luck@...el.com, vbabka@...e.cz, aarcange@...hat.com,
        hillf.zj@...baba-inc.com, hughd@...gle.com, oleg@...hat.com,
        peterz@...radead.org, riel@...hat.com, srikar@...ux.vnet.ibm.com,
        vdavydov.dev@...il.com, mingo@...nel.org,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org, x86@...nel.org
Subject: Re: [mm 4.15-rc8] Random oopses under memory pressure.

On Fri, Jan 19, 2018 at 11:33:42AM +0100, Michal Hocko wrote:
> On Fri 19-01-18 13:02:59, Kirill A. Shutemov wrote:
> > On Thu, Jan 18, 2018 at 06:22:13PM +0100, Michal Hocko wrote:
> > > On Thu 18-01-18 18:40:26, Kirill A. Shutemov wrote:
> > > [...]
> > > > +	/*
> > > > +	 * Make sure that pages are in the same section before doing pointer
> > > > +	 * arithmetics.
> > > > +	 */
> > > > +	if (page_to_section(pvmw->page) != page_to_section(page))
> > > > +		return false;
> > > 
> > > OK, THPs shouldn't cross memory sections AFAIK. My brain is meltdown
> > > these days so this might be a completely stupid question. But why don't
> > > you simply compare pfns? This would be just simpler, no?
> > 
> > In original code, we already had pvmw->page around and I thought it would
> > be easier to get page for the pte intead of looking for pfn for both
> > sides.
> > 
> > We these changes it's no longer the case.
> > 
> > Do you care enough to send a patch? :)
> 
> Well, memory sections are sparsemem concept IIRC. Unless I've missed
> something page_to_section is quarded by SECTION_IN_PAGE_FLAGS and that
> is conditional to CONFIG_SPARSEMEM. THP is a generic code so using it
> there is wrong unless I miss some subtle detail here.
> 
> Comparing pfn should be generic enough.

Good point.

What about something like this?

>From 861f68c555b87fd6c0ccc3428ace91b7e185b73a Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Date: Thu, 18 Jan 2018 18:24:07 +0300
Subject: [PATCH] mm, page_vma_mapped: Drop faulty pointer arithmetics in
 check_pte()

Tetsuo reported random crashes under memory pressure on 32-bit x86
system and tracked down to change that introduced
page_vma_mapped_walk().

The root cause of the issue is the faulty pointer math in check_pte().
As ->pte may point to an arbitrary page we have to check that they are
belong to the section before doing math. Otherwise it may lead to weird
results.

It wasn't noticed until now as mem_map[] is virtually contiguous on flatmem or
vmemmap sparsemem. Pointer arithmetic just works against all 'struct page'
pointers. But with classic sparsemem, it doesn't.

Let's restructure code a bit and replace pointer arithmetic with
operations on pfns.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
Reported-by: Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
Cc: stable@...r.kernel.org
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
---
 include/linux/swapops.h | 21 ++++++++++++++++++
 mm/page_vma_mapped.c    | 59 +++++++++++++++++++++++++++++++------------------
 2 files changed, 59 insertions(+), 21 deletions(-)

diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index 9c5a2628d6ce..1d3877c39a00 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -124,6 +124,11 @@ static inline bool is_write_device_private_entry(swp_entry_t entry)
 	return unlikely(swp_type(entry) == SWP_DEVICE_WRITE);
 }
 
+static inline unsigned long device_private_entry_to_pfn(swp_entry_t entry)
+{
+	return swp_offset(entry);
+}
+
 static inline struct page *device_private_entry_to_page(swp_entry_t entry)
 {
 	return pfn_to_page(swp_offset(entry));
@@ -154,6 +159,11 @@ static inline bool is_write_device_private_entry(swp_entry_t entry)
 	return false;
 }
 
+static inline unsigned long device_private_entry_to_pfn(swp_entry_t entry)
+{
+	return 0;
+}
+
 static inline struct page *device_private_entry_to_page(swp_entry_t entry)
 {
 	return NULL;
@@ -189,6 +199,11 @@ static inline int is_write_migration_entry(swp_entry_t entry)
 	return unlikely(swp_type(entry) == SWP_MIGRATION_WRITE);
 }
 
+static inline unsigned long migration_entry_to_pfn(swp_entry_t entry)
+{
+	return swp_offset(entry);
+}
+
 static inline struct page *migration_entry_to_page(swp_entry_t entry)
 {
 	struct page *p = pfn_to_page(swp_offset(entry));
@@ -218,6 +233,12 @@ static inline int is_migration_entry(swp_entry_t swp)
 {
 	return 0;
 }
+
+static inline unsigned long migration_entry_to_pfn(swp_entry_t entry)
+{
+	return 0;
+}
+
 static inline struct page *migration_entry_to_page(swp_entry_t entry)
 {
 	return NULL;
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index d22b84310f6d..072ae9bc5fee 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -30,8 +30,28 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw)
 	return true;
 }
 
+/**
+ * check_pte - check if @pvmw->page is mapped at the @pvmw->pte
+ *
+ * page_vma_mapped_walk() found a place where @pvmw->page is *potentially*
+ * mapped. check_pte() has to validate this.
+ *
+ * @pvmw->pte may point to empty PTE, swap PTE or PTE pointing to arbitrary
+ * page.
+ *
+ * If PVMW_MIGRATION flag is set, returns true if @pvmw->pte contains migration
+ * entry that points to @pvmw->page or any subpage in case of THP.
+ *
+ * If PVMW_MIGRATION flag is not set, returns true if @pvmw->pte points to
+ * @pvmw->page or any subpage in case of THP.
+ *
+ * Otherwise, return false.
+ *
+ */
 static bool check_pte(struct page_vma_mapped_walk *pvmw)
 {
+	unsigned long pfn;
+
 	if (pvmw->flags & PVMW_MIGRATION) {
 #ifdef CONFIG_MIGRATION
 		swp_entry_t entry;
@@ -41,37 +61,34 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw)
 
 		if (!is_migration_entry(entry))
 			return false;
-		if (migration_entry_to_page(entry) - pvmw->page >=
-				hpage_nr_pages(pvmw->page)) {
-			return false;
-		}
-		if (migration_entry_to_page(entry) < pvmw->page)
-			return false;
+
+		pfn = migration_entry_to_pfn(entry);
 #else
 		WARN_ON_ONCE(1);
 #endif
-	} else {
-		if (is_swap_pte(*pvmw->pte)) {
-			swp_entry_t entry;
+	} else if (is_swap_pte(*pvmw->pte)) {
+		swp_entry_t entry;
 
-			entry = pte_to_swp_entry(*pvmw->pte);
-			if (is_device_private_entry(entry) &&
-			    device_private_entry_to_page(entry) == pvmw->page)
-				return true;
-		}
+		/* Handle un-addressable ZONE_DEVICE memory */
+		entry = pte_to_swp_entry(*pvmw->pte);
+		if (!is_device_private_entry(entry))
+			return false;
 
+		pfn = device_private_entry_to_pfn(entry);
+	} else {
 		if (!pte_present(*pvmw->pte))
 			return false;
 
-		/* THP can be referenced by any subpage */
-		if (pte_page(*pvmw->pte) - pvmw->page >=
-				hpage_nr_pages(pvmw->page)) {
-			return false;
-		}
-		if (pte_page(*pvmw->pte) < pvmw->page)
-			return false;
+		pfn = pte_pfn(*pvmw->pte);
 	}
 
+	if (pfn < page_to_pfn(pvmw->page))
+		return false;
+
+	/* THP can be referenced by any subpage */
+	if (pfn - page_to_pfn(pvmw->page) >= hpage_nr_pages(pvmw->page))
+		return false;
+
 	return true;
 }
 
-- 
 Kirill A. Shutemov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ