lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YYFMtFwkKLUXaG/W@dhcp22.suse.cz>
Date:   Tue, 2 Nov 2021 15:35:32 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     Oscar Salvador <osalvador@...e.de>
Cc:     David Hildenbrand <david@...hat.com>,
        Alexey Makhalov <amakhalov@...are.com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>,
        Oscar Salvador <OSalvador@...e.com>
Subject: Re: [PATCH] mm: fix panic in __alloc_pages

On Tue 02-11-21 14:52:01, Oscar Salvador wrote:
> On Tue, Nov 02, 2021 at 02:25:03PM +0100, Michal Hocko wrote:
> > I think we want to learn how exactly Alexey brought that cpu up. Because
> > his initial thought on add_cpu resp cpu_up doesn't seem to be correct.
> > Or I am just not following the code properly. Once we know all those
> > details we can get in touch with cpu hotplug maintainers and see what
> > can we do.
> 
> I am not really familiar with CPU hot-onlining, but I have been taking a look.
> As with memory, there are two different stages, hot-adding and onlining (and the
> counterparts).
> 
> Part of the hot-adding being:
> 
> acpi_processor_get_info
>  acpi_processor_hotadd_init
>   arch_register_cpu
>    register_cpu
> 
> One of the things that register_cpu() does is to set cpu->dev.bus pointing to
> &cpu_subsys, which is:
> 
> struct bus_type cpu_subsys = {
> 	.name = "cpu",
> 	.dev_name = "cpu",
> 	.match = cpu_subsys_match,
> #ifdef CONFIG_HOTPLUG_CPU
> 	.online = cpu_subsys_online,
> 	.offline = cpu_subsys_offline,
> #endif
> };
> 
> Then, the onlining part (in case of a udev rule or someone onlining the device)
> would be:
> 
> online_store
>  device_online
>   cpu_subsys_online
>    cpu_device_up
>     cpu_up
>      ...
>      online node
> 
> Since Alexey disabled the udev rule and no one onlined the CPU, online_store()->
> device_online() wasn't really called.
> 
> The following only applies to x86_64:
> I think we got confused because cpu_device_up() is also called from add_cpu(),
> but that is an exported function and x86 does not call add_cpu() unless for
> debugging purposes (check kernel/torture.c and arch/x86/kernel/topology.c).
> It does the onlining through online_store()...
> So we can take add_cpu() off the equation here.

Yes, so the real problem is (thanks for pointing me to the acpi code).
The cpu->node association is done in acpi_map_cpu2node and I suspect
this expects that the node is already present as it gets the information
from SRAT/PXM tables which are parsed during boot. But I might be just
confused or maybe just VMware inject new entries here somehow.

Another interesting thing is that acpi_map_cpu2node skips over
association if there is no node found in SRAT but that should only mean
it would use the default initialization which should be hopefuly 0.

Anyway, I have found in my notes
https://www.spinics.net/lists/kernel/msg3010886.html which is a slightly
different problem but it has some notes about how the initialization
mess works (that one was boot time though and hotplug might be different
actually).

I have ran out of time for this today so hopefully somebody can re-learn
that from there...

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ