lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 18 Nov 2021 09:35:30 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     Alexey Makhalov <amakhalov@...are.com>
Cc:     Dennis Zhou <dennis@...nel.org>,
        Eric Dumazet <eric.dumazet@...il.com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        David Hildenbrand <david@...hat.com>,
        Oscar Salvador <osalvador@...e.de>, Tejun Heo <tj@...nel.org>,
        Christoph Lameter <cl@...ux.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: Re: [PATCH v3] mm: fix panic in __alloc_pages

On Tue 16-11-21 20:22:49, Alexey Makhalov wrote:
> 
> 
> > On Nov 16, 2021, at 1:17 AM, Michal Hocko <mhocko@...e.com> wrote:
> > 
> > On Tue 16-11-21 01:31:44, Alexey Makhalov wrote:
> > [...]
> >> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> >> index 6737b1cbf..bbc1a70d5 100644
> >> --- a/drivers/acpi/acpi_processor.c
> >> +++ b/drivers/acpi/acpi_processor.c
> >> @@ -200,6 +200,10 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr)
> >>        * gets online for the first time.
> >>        */
> >>       pr_info("CPU%d has been hot-added\n", pr->id);
> >> +       {
> >> +               int nid = cpu_to_node(pr->id);
> >> +               printk("%s:%d cpu %d, node %d, online %d, ndata %p\n", __FUNCTION__, __LINE__, pr->id, nid, node_online(nid), NODE_DATA(nid));
> >> +       }
> >>       pr->flags.need_hotplug_init = 1;
> > 
> > OK, IIUC you are adding a processor which is outside of
> > possible_cpu_mask and that means that the node is not allocated for such
> > a future to be hotplugged cpu and its memory node. init_cpu_to_node
> > would have done that initialization otherwise.
> It is not correct.
> 
> possible_cpus is 128 for this VM. Look at SRAT and percpu output for proof.
> [    0.085524] SRAT: PXM 127 -> APIC 0xfe -> Node 127
> [    0.118928] setup_percpu: NR_CPUS:128 nr_cpumask_bits:128 nr_cpu_ids:128 nr_node_ids:128

OK, I see. I have missed that when looking at the boot log you have
sent.

> It is impossible to add processor outside of possible_cpu_mask. possible_cpus is absolute maximum
> that system can support. See Documentation/core-api/cpu_hotplug.rst

That was my understanding hence the suspicion you might be doing
something that is not really supported.

> Number of present and onlined CPUs (and nodes) is 4. Other 124 CPUs (and nodes) are not present, but can
> be potentially hot added.

Yes this is a configuration I have already seen. The cpu->node binding
was configured during the boot time though IIRC.

> Number of initialized nodes is 4, as init_cpu_to_node() will skip not yet present nodes,
> see arch/x86/mm/numa.c:798 (numa_cpu_node(CPU #4) == NUMA_NO_NODE)

Isn't this the problem? Why is the cpu->node association missing here? 

> 788 void __init init_cpu_to_node(void)
> 789 {
> 790         int cpu;
> 791         u16 *cpu_to_apicid = early_per_cpu_ptr(x86_cpu_to_apicid);
> 792
> 793         BUG_ON(cpu_to_apicid == NULL);
> 794
> 795         for_each_possible_cpu(cpu) {
> 796                 int node = numa_cpu_node(cpu);
> 797
> 798                 if (node == NUMA_NO_NODE)
> 799                         continue;
> 800
> 
> After CPU (and node) hot plug:
> - CPU 4 is marker as present, but not yet online
> - New node got ID 4. numa_cpu_node(CPU #4) returns 4
> - node_online(4) == 0 and NODE_DATA(4) == NULL, but it will be accessed inside
> for_each_possible_cpu loop in percpu allocation.
> 
> Digging further.
> Even if x86/CPU hot add maintainers decide to clean up memoryless node hot add code to initialize the node on time of
> attaching it (to be aligned with mm node while memory hot add), this percpu fix is still needed as it is used during
> the node onlining, See chicken and egg problem that I described above.

I have to say I do not see the chicken and egg problem. As long as
init_cpu_to_node initializes the memoryless node for the cpu properly
then the pcp allocator doesn't really have to care as the page allocator
falls back to to first populated node in a distance order. So I believe
the whole issue boils down to addressing why init_cpu_to_node doesn't
see a proper cpu->node association.

> Or as 2nd option, numa_cpu_node(4) should return NUMA_NO_NODE until node 4 get fully initialized.
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ