linux-kernel - Re: [PATCH 0/6] mm: make movable onlining suck less

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170405135248.GQ6035@dhcp22.suse.cz>
Date:   Wed, 5 Apr 2017 15:52:49 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Reza Arbab <arbab@...ux.vnet.ibm.com>
Cc:     Mel Gorman <mgorman@...e.de>, linux-mm@...ck.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Yasuaki Ishimatsu <yasu.isimatu@...il.com>,
        Tang Chen <tangchen@...fujitsu.com>, qiuxishi@...wei.com,
        Kani Toshimitsu <toshi.kani@....com>, slaoub@...il.com,
        Joonsoo Kim <js1304@...il.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Zhang Zhen <zhenzhang.zhang@...wei.com>,
        David Rientjes <rientjes@...gle.com>,
        Daniel Kiper <daniel.kiper@...cle.com>,
        Igor Mammedov <imammedo@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Chris Metcalf <cmetcalf@...lanox.com>,
        Dan Williams <dan.j.williams@...il.com>,
        Heiko Carstens <heiko.carstens@...ibm.com>,
        Lai Jiangshan <laijs@...fujitsu.com>,
        Martin Schwidefsky <schwidefsky@...ibm.com>
Subject: Re: [PATCH 0/6] mm: make movable onlining suck less

On Tue 04-04-17 16:43:39, Reza Arbab wrote:
> On Tue, Apr 04, 2017 at 09:41:22PM +0200, Michal Hocko wrote:
> >On Tue 04-04-17 13:30:13, Reza Arbab wrote:
> >>I think I found another edge case.  You
> >>get an oops when removing all of a node's memory:
> >>
> >>__nr_to_section
> >>__pfn_to_section
> >>find_biggest_section_pfn
> >>shrink_pgdat_span
> >>__remove_zone
> >>__remove_section
> >>__remove_pages
> >>arch_remove_memory
> >>remove_memory
> >
> >Is this something new or an old issue? I believe the state after the
> >online should be the same as before. So if you onlined the full node
> >then there shouldn't be any difference. Let me have a look...
> 
> It's new. Without this patchset, I can repeatedly
> add_memory()->online_movable->offline->remove_memory() all of a node's
> memory.

OK, I know what is going on here.
shrink_pgdat_span: start_pfn=0x1ff00, end_pfn=0x20000, pgdat_start_pfn=0x0, pgdat_end_pfn=0x20000
[...]
find_biggest_section_pfn loop: pfn=0xff, sec_nr = 0x0

so the node starts at pfn 0 while we are trying to remove range starting
from pfn=255 (1MB). Rather than going with find_smallest_section_pfn we
go with the other branch and that underflows as already mentioned. I
seriously doubt that the node really starts at pfn 0. I am not sure
which arch you are testing on but I believe we reserve the lowest
address pfn range on all aches. The previous code presumably handled
that properly because the original node/zone has started at the lowest
possible address and the zone shifting then preserves that.

My code doesn't do that though. So I guess I have to sanitize. Does this
help? Please drop the "mm, memory_hotplug: get rid of zone/node
shrinking" patch.
---
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index acf2b5eb5ecb..2c5613d19eb6 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -750,6 +750,15 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages, int online_typ
 	int ret;
 	struct memory_notify arg;
 
+	do {
+		if (pfn_valid(pfn))
+			break;
+		pfn++;
+	} while (--nr_pages > 0);
+
+	if (!nr_pages)
+		return -EINVAL;
+
 	nid = pfn_to_nid(pfn);
 	if (!allow_online_pfn_range(nid, pfn, nr_pages, online_type))
 		return -EINVAL;
-- 
Michal Hocko
SUSE Labs