lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 7 Dec 2021 13:28:31 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Alexey Makhalov <amakhalov@...are.com>,
        Dennis Zhou <dennis@...nel.org>,
        Eric Dumazet <eric.dumazet@...il.com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Oscar Salvador <osalvador@...e.de>, Tejun Heo <tj@...nel.org>,
        Christoph Lameter <cl@...ux.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: Re: [PATCH v3] mm: fix panic in __alloc_pages

>>> +			free_area_init_memoryless_node(nid);
>>
>> That's really just free_area_init_node() below, I do wonder what value
>> free_area_init_memoryless_node() has as of today.
> 
> I am not sure there is any real value in having this special name for
> this but I have kept is sync with what x86 does currently. If we want to
> remove the wrapper then just do it everywhere. I can do that on top.
> 

Sure, just a general comment.

>>> +			continue;
>>> +		}
>>> +
>>>  		free_area_init_node(nid);
>>>  
>>>  		/* Any memory on that node */
>>>
>>> Could you give it a try? I do not have any machine which would exhibit
>>> the problem so I cannot really test this out. I hope build_zone_info
>>> will not choke on this. I assume the node distance table is
>>> uninitialized for these nodes and IIUC this should lead to an assumption
>>> that all other nodes are close. But who knows that can blow up there.
>>>
>>> Btw. does this make any sense at all to others?
>>>
>>
>> __build_all_zonelists() has to update the zonelists of all nodes I think.
> 
> I am not sure what you mean. This should be achieved by this patch
> because the boot time build_all_zonelists will go over all online nodes

"Over all possible nodes", including online and offline ones, to make
sure any possible node has a valid pgdat. IIUC, you're not changing
anything about online vs offline nodes, only that we have a pgdat also
for offline nodes.

> (i.e. with pgdat). free_area_init happens before that. I am just worried
> that the arch specific node_distance() will generate a complete garbage
> or blow up for some reason. 

Assume you online a new zone and then call __build_all_zonelists() to
include the zone in all zonelists (via online_pages()).
__build_all_zonelists() will not include offline nodes (that still have
a pgdat with a valid zonelist now).

Similarly, assume you online a zone and then call
__build_all_zonelists() to exclude the zone from all zonelists (via
offline_pages()). __build_all_zonelists() will not include offline nodes
(that still have a pgdat with a valid zonelist now).

Essentially, IIRC, even after your change
start_kernel()->build_all_zonelists(NULL)->build_all_zonelists_init()->__build_all_zonelists(NULL)
won't initialize the zonelist of the new pgdat, because the nodes are
offline.

I'd assume we'd need

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c5952749ad40..e5d958abc7cc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6382,7 +6382,7 @@ static void __build_all_zonelists(void *data)
        if (self && !node_online(self->node_id)) {
                build_zonelists(self);
        } else {
-               for_each_online_node(nid) {
+               for_each_node(nid) {
                        pg_data_t *pgdat = NODE_DATA(nid);

                        build_zonelists(pgdat);

to properly initialize the zonelist also for the offline nodes with a
valid pgdat.

But maybe I am missing something important regarding online vs. offline
nodes that your patch changes?

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ