[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e7aed7c0-b7b1-4a94-f323-0bcde2f879d2@redhat.com>
Date: Tue, 2 Nov 2021 13:39:06 +0100
From: David Hildenbrand <david@...hat.com>
To: Michal Hocko <mhocko@...e.com>
Cc: Alexey Makhalov <amakhalov@...are.com>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
Andrew Morton <akpm@...ux-foundation.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"stable@...r.kernel.org" <stable@...r.kernel.org>,
Oscar Salvador <OSalvador@...e.com>
Subject: Re: [PATCH] mm: fix panic in __alloc_pages
>> Yes, but a zonelist cannot be correct for an offline node, where we might
>> not even have an allocated pgdat yet. No pgdat, no zonelist. So as soon as
>> we allocate the pgdat and set the node online (->hotadd_new_pgdat()), the zone lists have to be correct. And I can spot an build_all_zonelists() in hotadd_new_pgdat().
>
> Yes, that is what I had in mind. We are talking about two things here.
> Memoryless nodes and offline nodes. The later sounds like a bug to me.
Agreed. memoryless nodes should just have proper zonelists -- which
seems to be the case.
>> Maybe __alloc_pages_bulk() and alloc_pages_node() should bail out directly
>> (VM_BUG()) in case we're providing an offline node with eventually no/stale pgdat as
>> preferred nid.
>
> Historically, those allocation interfaces were not trying to be robust
> against wrong inputs because that adds cpu cycles for everybody for
> "what if buggy" code. This has worked (surprisingly) well. Memory less
> nodes have brought in some confusion but this is still something that we
> can address on a higher level. Nobody give arbitrary nodes as an input.
> cpu_to_node might be tricky because it can point to a memory less node
> which along with __GFP_THISNODE is very likely not something anybody
> wants. Hence cpu_to_mem should be used for allocations. I hate we have
> two very similar APIs...
To be precise, I'm wondering if we should do:
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 55b2ec1f965a..8c49b88336ee 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -565,7 +565,7 @@ static inline struct page *
__alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order)
{
VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
- VM_WARN_ON((gfp_mask & __GFP_THISNODE) && !node_online(nid));
+ VM_WARN_ON(!node_online(nid));
return __alloc_pages(gfp_mask, order, nid, NULL);
}
(Or maybe VM_BUG_ON)
Because it cannot possibly work and we'll dereference NULL later.
>
> But something seems wrong in this case. cpu_to_node shouldn't return
> offline nodes. That is just a land mine. It is not clear to me how the
> cpu has been brought up so that the numa node allocation was left
> behind. As pointed in other email add_cpu resp. cpu_up is not it.
> Is it possible that the cpu bring up was only half way?
I tried to follow the code (what sets a CPU present, what sets a CPU
online, when do we update cpu_to_node() mapping) and IMHO it's all a big
mess. Maybe it's clearer to people familiar with that code, but CPU
hotplug in general seems to be a confusing piece of (arch-specific) code.
Also, I have no clue if cpu_to_node() mapping will get invalidated after
unplugging that CPU, or if the mapping will simply stay around for all
eternity ...
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists