linux-kernel - Re: [PATCH v5 3/3] mm: don't account memmap per-node

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d28059a0-25af-6d0c-3f6d-7e7bc208a0da@google.com>
Date: Sun, 11 Aug 2024 13:26:17 -0700 (PDT)
From: David Rientjes <rientjes@...gle.com>
To: Pasha Tatashin <pasha.tatashin@...een.com>
cc: akpm@...ux-foundation.org, linux-kernel@...r.kernel.org, 
    linux-mm@...ck.org, linux-cxl@...r.kernel.org, cerasuolodomenico@...il.com, 
    hannes@...xchg.org, j.granados@...sung.com, lizhijian@...itsu.com, 
    muchun.song@...ux.dev, nphamcs@...il.com, rppt@...nel.org, 
    souravpanda@...gle.com, vbabka@...e.cz, willy@...radead.org, 
    dan.j.williams@...el.com, yi.zhang@...hat.com, alison.schofield@...el.com, 
    david@...hat.com, yosryahmed@...gle.com
Subject: Re: [PATCH v5 3/3] mm: don't account memmap per-node

On Fri, 9 Aug 2024, Pasha Tatashin wrote:

> Fix invalid access to pgdat during hot-remove operation:
> ndctl users reported a GPF when trying to destroy a namespace:
> $ ndctl destroy-namespace all -r all -f
>  Segmentation fault
>  dmesg:
>  Oops: general protection fault, probably for
>  non-canonical address 0xdffffc0000005650: 0000 [#1] PREEMPT SMP KASAN
>  PTI
>  KASAN: probably user-memory-access in range
>  [0x000000000002b280-0x000000000002b287]
>  CPU: 26 UID: 0 PID: 1868 Comm: ndctl Not tainted 6.11.0-rc1 #1
>  Hardware name: Dell Inc. PowerEdge R640/08HT8T, BIOS
>  2.20.1 09/13/2023
>  RIP: 0010:mod_node_page_state+0x2a/0x110
> 
> cxl-test users report a GPF when trying to unload the test module:
> $ modrpobe -r cxl-test
>  dmesg
>  BUG: unable to handle page fault for address: 0000000000004200
>  #PF: supervisor read access in kernel mode
>  #PF: error_code(0x0000) - not-present page
>  PGD 0 P4D 0
>  Oops: Oops: 0000 [#1] PREEMPT SMP PTI
>  CPU: 0 UID: 0 PID: 1076 Comm: modprobe Tainted: G O N 6.11.0-rc1 #197
>  Tainted: [O]=OOT_MODULE, [N]=TEST
>  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/15
>  RIP: 0010:mod_node_page_state+0x6/0x90
> 
> Currently, when memory is hot-plugged or hot-removed the accounting is
> done based on the assumption that memmap is allocated from the same node
> as the hot-plugged/hot-removed memory, which is not always the case.
> 
> In addition, there are challenges with keeping the node id of the memory
> that is being remove to the time when memmap accounting is actually
> performed: since this is done after remove_pfn_range_from_zone(), and
> also after remove_memory_block_devices(). Meaning that we cannot use
> pgdat nor walking though memblocks to get the nid.
> 
> Given all of that, account the memmap overhead system wide instead.
> 
> For this we are going to be using global atomic counters, but given that
> memmap size is rarely modified, and normally is only modified either
> during early boot when there is only one CPU, or under a hotplug global
> mutex lock, therefore there is no need for per-cpu optimizations.
> 
> Also, while we are here rename nr_memmap to nr_memmap_pages, and
> nr_memmap_boot to nr_memmap_boot_pages to be self explanatory that the
> units are in page count.
> 
> Reported-by: Yi Zhang <yi.zhang@...hat.com>
> Closes: https://lore.kernel.org/linux-cxl/CAHj4cs9Ax1=CoJkgBGP_+sNu6-6=6v=_L-ZBZY0bVLD3wUWZQg@mail.gmail.com
> Reported-by: Alison Schofield <alison.schofield@...el.com>
> Closes: https://lore.kernel.org/linux-mm/Zq0tPd2h6alFz8XF@aschofie-mobl2/#t
> 
> Fixes: 15995a352474 ("mm: report per-page metadata information")
> Signed-off-by: Pasha Tatashin <pasha.tatashin@...een.com>
> Tested-by: Dan Williams <dan.j.williams@...el.com>
> Tested-by: Alison Schofield <alison.schofield@...el.com>
> Acked-by: David Hildenbrand <david@...hat.com>

Acked-by: David Rientjes <rientjes@...gle.com>