lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6e5ebc19-890c-b6dd-1924-9f25c441010d@redhat.com>
Date:   Fri, 17 Dec 2021 15:51:31 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Michal Hocko <mhocko@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Alexey Makhalov <amakhalov@...are.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, linux-mm@...ck.org,
        Dennis Zhou <dennis@...nel.org>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Oscar Salvador <osalvador@...e.de>, Tejun Heo <tj@...nel.org>,
        Christoph Lameter <cl@...ux.com>,
        Nico Pache <npache@...hat.com>
Subject: Re: [PATCH v2 0/4] mm, memory_hotplug: handle unitialized numa node
 gracefully

On 14.12.21 11:07, Michal Hocko wrote:
> Hi,
> this should be the full bundle for now. I have ended up with 4 patches.
> The primary fix is patch 2 (should be reasonably easy to backport to
> older kernels if there is any need for that). Patches 3 and 4 are mere
> clean ups.
> 
> I will repost once this can get some testing from Alexey. Shouldn't be
> too much different from http://lkml.kernel.org/r/YbHfBgPQMkjtuHYF@dhcp22.suse.cz
> with the follow up fix squashed in.
> 
> I would really appreciate to hear more about http://lkml.kernel.org/r/YbMZsczMGpChaWz0@dhcp22.suse.cz
> because I would like to add that information to the changelog as well.
> 
> Thanks for the review and testing.

Playing with memory hotplug only (only one hotpluggable node is possible with QEMU right now as only one will get added to SRAT with the hotplug range)

Start with one empty node:

#! /bin/bash
sudo qemu/build/qemu-system-x86_64 \
    --enable-kvm \
    -m 8G,slots=2,maxmem=16G \
    -object memory-backend-ram,id=mem0,size=4G \
    -object memory-backend-ram,id=mem1,size=4G \
    -numa node,cpus=0-1,nodeid=0,memdev=mem0 \
    -numa node,cpus=2-3,nodeid=1,memdev=mem1 \
    -numa node,nodeid=2 \
    -smp 4 \
    -drive file=/home/dhildenb/git/Fedora-Cloud-Base-33-1.2.x86_64.qcow2,format=qcow2,if=virtio \
    -cpu host \
    -machine q35 \
    -nographic \
    -nodefaults \
    -monitor unix:/var/tmp/monitor,server,nowait \
    -chardev stdio,id=serial,signal=off \
    -device isa-serial,chardev=serial

1. Guest state when booting

[    0.002506] SRAT: PXM 0 -> APIC 0x00 -> Node 0
[    0.002508] SRAT: PXM 0 -> APIC 0x01 -> Node 0
[    0.002510] SRAT: PXM 1 -> APIC 0x02 -> Node 1
[    0.002511] SRAT: PXM 1 -> APIC 0x03 -> Node 1
[    0.002513] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
[    0.002515] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0x7fffffff]
[    0.002517] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x17fffffff]
[    0.002518] ACPI: SRAT: Node 1 PXM 1 [mem 0x180000000-0x27fffffff]
[    0.002520] ACPI: SRAT: Node 2 PXM 2 [mem 0x280000000-0x4ffffffff] hotplug
[    0.002523] NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000-0x7fffffff] -> [mem 0x00000000
-0x7fffffff]
[    0.002525] NUMA: Node 0 [mem 0x00000000-0x7fffffff] + [mem 0x100000000-0x17fffffff] -> [mem 0x000000
00-0x17fffffff]
[    0.002533] NODE_DATA(0) allocated [mem 0x17ffd5000-0x17fffffff]
[    0.002716] NODE_DATA(1) allocated [mem 0x27ffd5000-0x27fffffff]
[    0.017960] Zone ranges:
[    0.017966]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
[    0.017969]   DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
[    0.017971]   Normal   [mem 0x0000000100000000-0x000000027fffffff]
[    0.017972]   Device   empty
[    0.017974] Movable zone start for each node
[    0.017976] Early memory node ranges
[    0.017977]   node   0: [mem 0x0000000000001000-0x000000000009efff]
[    0.017979]   node   0: [mem 0x0000000000100000-0x000000007ffd5fff]
[    0.017980]   node   0: [mem 0x0000000100000000-0x000000017fffffff]
[    0.017982]   node   1: [mem 0x0000000180000000-0x000000027fffffff]
[    0.017984] Initmem setup node 0 [mem 0x0000000000001000-0x000000017fffffff]
[    0.017990] Initmem setup node 1 [mem 0x0000000180000000-0x000000027fffffff]
[    0.017993] Node 2 uninitialized by the platform. Please report with boot dmesg.
[    0.018008] Initmem setup node 2 [mem 0x0000000000000000-0x0000000000000000]
[    0.018011] On node 0, zone DMA: 1 pages in unavailable ranges
[    0.018031] On node 0, zone DMA: 97 pages in unavailable ranges
[    0.023622] On node 0, zone Normal: 42 pages in unavailable ranges

# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1
node 0 size: 3921 MB
node 0 free: 3638 MB
node 1 cpus: 2 3
node 1 size: 4022 MB
node 1 free: 3519 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 
# cat /sys/devices/system/node/online 
0-1
# cat /sys/devices/system/node/possible 
0-2


3. Hotplug a DIMM and online it to ZONE_MOVABLE

# echo online_movable > /sys/devices/system/memory/auto_online_blocks 


$ echo "object_add memory-backend-ram,id=hmem0,size=8G" | sudo nc -U /var/tmp/monitor ; echo
$ echo "device_add pc-dimm,id=dimm0,memdev=hmem0,node=2" | sudo nc -U /var/tmp/monitor ; echo


4. Guest state after hotplug

[  334.541452] Built 2 zonelists, mobility grouping on.  Total pages: 1999733
[  334.541908] Policy zone: Normal
[  334.559853] Fallback order for Node 0: 0 2 1 
[  334.560234] Fallback order for Node 1: 1 2 0 
[  334.560524] Fallback order for Node 2: 2 0 1 
[  334.560810] Built 3 zonelists, mobility grouping on.  Total pages: 2032501
[  334.561281] Policy zone: Normal

# numactl --hardware
available: 3 nodes (0-2)
node 0 cpus: 0 1
node 0 size: 3921 MB
node 0 free: 3529 MB
node 1 cpus: 2 3
node 1 size: 4022 MB
node 1 free: 3564 MB
node 2 cpus:
node 2 size: 8192 MB
node 2 free: 8192 MB
node distances:
node   0   1   2 
  0:  10  20  20 
  1:  20  10  20 
  2:  20  20  10 
# cat /sys/devices/system/node/online 
0-2
# cat /sys/devices/system/node/possible 
0-2
# cat /sys/devices/system/node/has_memory 
0-2
# cat /sys/devices/system/node/has_normal_memory 
0-1
# cat /sys/devices/system/node/has_cpu 
0-1


5. Unplug DIMM

$ echo "device_del dimm0" | sudo nc -U /var/tmp/monitor ; echo


6. Guest state after unplug

[  494.218938] Fallback order for Node 0: 0 2 1 
[  494.219315] Fallback order for Node 1: 1 2 0 
[  494.219626] Fallback order for Node 2: 2 0 1 
[  494.220430] Built 3 zonelists, mobility grouping on.  Total pages: 1999736
[  494.221024] Policy zone: Normal

# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1
node 0 size: 3921 MB
node 0 free: 3661 MB
node 1 cpus: 2 3
node 1 size: 4022 MB
node 1 free: 3565 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 
# cat /sys/devices/system/node/online 
0-1
# cat /sys/devices/system/node/possible 
0-2


7. Hotplug DIMM + online to ZONE_NORMAL

# echo online_kernel > /sys/devices/system/memory/auto_online_blocks 

$ echo "device_add pc-dimm,id=dimm0,memdev=hmem0,node=2" | sudo nc -U /var/tmp/monitor ; echo


8. Guest state after hotplug

# numactl --hardware
available: 3 nodes (0-2)
node 0 cpus: 0 1
node 0 size: 3921 MB
node 0 free: 3534 MB
node 1 cpus: 2 3
node 1 size: 4022 MB
node 1 free: 3567 MB
node 2 cpus:
node 2 size: 8192 MB
node 2 free: 8192 MB
node distances:
node   0   1   2 
  0:  10  20  20 
  1:  20  10  20 
  2:  20  20  10 

# cat /sys/devices/system/node/online 
0-2
# cat /sys/devices/system/node/possible 
0-2
# cat /sys/devices/system/node/has_memory 
0-2
# cat /sys/devices/system/node/has_normal_memory 
0-2
# cat /sys/devices/system/node/has_cpu
0-1



No surprises found so far. I'll be most offline for the next 2 weeks,
so an official review might take some more time.

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ