[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YbBtKsTK0oWgV5B7@dhcp22.suse.cz>
Date: Wed, 8 Dec 2021 09:30:34 +0100
From: Michal Hocko <mhocko@...e.com>
To: Alexey Makhalov <amakhalov@...are.com>
Cc: David Hildenbrand <david@...hat.com>,
Dennis Zhou <dennis@...nel.org>,
Eric Dumazet <eric.dumazet@...il.com>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Oscar Salvador <osalvador@...e.de>, Tejun Heo <tj@...nel.org>,
Christoph Lameter <cl@...ux.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: Re: [PATCH v3] mm: fix panic in __alloc_pages
On Wed 08-12-21 08:19:16, Alexey Makhalov wrote:
> Hi Michal,
>
> > On Dec 8, 2021, at 12:04 AM, Michal Hocko <mhocko@...e.com> wrote:
> >
> > On Tue 07-12-21 17:17:27, Alexey Makhalov wrote:
> >>
> >>
> >>> On Dec 7, 2021, at 9:13 AM, David Hildenbrand <david@...hat.com> wrote:
> >>>
> >>> On 07.12.21 18:02, Alexey Makhalov wrote:
> >>>>
> >>>>
> >>>>> On Dec 7, 2021, at 8:36 AM, Michal Hocko <mhocko@...e.com> wrote:
> >>>>>
> >>>>> On Tue 07-12-21 17:27:29, Michal Hocko wrote:
> >>>>> [...]
> >>>>>> So your proposal is to drop set_node_online from the patch and add it as
> >>>>>> a separate one which handles
> >>>>>> - sysfs part (i.e. do not register a node which doesn't span a
> >>>>>> physical address space)
> >>>>>> - hotplug side of (drop the pgd allocation, register node lazily
> >>>>>> when a first memblocks are registered)
> >>>>>
> >>>>> In other words, the first stage
> >>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >>>>> index c5952749ad40..f9024ba09c53 100644
> >>>>> --- a/mm/page_alloc.c
> >>>>> +++ b/mm/page_alloc.c
> >>>>> @@ -6382,7 +6382,11 @@ static void __build_all_zonelists(void *data)
> >>>>> if (self && !node_online(self->node_id)) {
> >>>>> build_zonelists(self);
> >>>>> } else {
> >>>>> - for_each_online_node(nid) {
> >>>>> + /*
> >>>>> + * All possible nodes have pgdat preallocated
> >>>>> + * free_area_init
> >>>>> + */
> >>>>> + for_each_node(nid) {
> >>>>> pg_data_t *pgdat = NODE_DATA(nid);
> >>>>>
> >>>>> build_zonelists(pgdat);
> >>>>
> >>>> Will it blow up memory usage for the nodes which might never be onlined?
> >>>> I prefer the idea of init on demand.
> >>>>
> >>>> Even now there is an existing problem.
> >>>> In my experiments, I observed _huge_ memory consumption increase by increasing number
> >>>> of possible numa nodes. I’m going to report it in separate mail thread.
> >>>
> >>> I already raised that PPC might be problematic in that regard. Which
> >>> architecture / setup do you have in mind that can have a lot of possible
> >>> nodes?
> >>>
> >> It is x86_64 VMware VM, not the regular one, but specially configured (1 vCPU per node,
> >> with hot-plug support, 128 possible nodes)
> >
> > This is slightly tangent but could you elaborate more on this setup and
> > reasoning behind it. I was already curious when you mentioned this
> > previously. Why would you want to have so many nodes and having 1:1 with
> > CPUs. What is the resulting NUMA topology?
>
> This setup with 128 nodes was used purely for development purposes. That is when the issue
> with hot adding numa nodes was found.
OK, I see.
> Original issue presents even with feasible number of nodes.
Yes the issue is independent on the number of offline nodes currently.
The number of nodes is only interesting for the wasted amount of memory
if we are to allocate pgdat for each possible node.
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists