[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20170410163914.GA4404@WeideMacBook-Pro.local>
Date: Tue, 11 Apr 2017 00:39:14 +0800
From: Wei Yang <richard.weiyang@...il.com>
To: Borislav Petkov <bp@...en8.de>
Cc: Wei Yang <richard.weiyang@...il.com>,
"Kirill A. Shutemov" <kirill@...temov.name>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>, Tejun Heo <tj@...nel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [Patch V2 2/2] x86/mm/numa: remove the
numa_nodemask_from_meminfo()
On Mon, Apr 10, 2017 at 02:43:20PM +0200, Borislav Petkov wrote:
>On Sun, Apr 09, 2017 at 11:12:14AM +0800, Wei Yang wrote:
>> Oops, sorry to bring in the regression with my cleanup.
>> I haven't noticed there is a kernel command line "numa=fake", which
>> is the cause of the crash I think.
>
>Of course it is, didn't you see my debugging upthread?
>
>> So from my understanding, I am goting to do these tests:
>>
>> 1. all fake numa scenarios with Kirill's qemu command line
>
>It is enough if you boot the kernel with "numa=fake..."
>
>> 2. Real numa scenarios with following qemu command option
>
>Not qemu command option but a kernel cmdline option.
>
>> 3. Baremetal
>>
>> One more question, on the baremetal mathine, I can't change the
>> numa configuration, so there would be only one case. Do you have
>> some specific requirement?
>
>numa=fake on baremetal too.
>
>> Well, if I missed something, just let me know :-)
>>
>> > Qemu can emulate real numa too, for example you can boot with:
>> >
>> > -smp 64 \
>> > -numa node,nodeid=0,cpus=1-8 \
>> > -numa node,nodeid=1,cpus=9-16 \
>> > -numa node,nodeid=2,cpus=17-24 \
>> > -numa node,nodeid=3,cpus=25-32 \
>> > -numa node,nodeid=4,cpus=0 \
>> > -numa node,nodeid=4,cpus=33-39 \
>> > -numa node,nodeid=5,cpus=40-47 \
>> > -numa node,nodeid=6,cpus=48-55 \
>> > -numa node,nodeid=7,cpus=56-63
>
>Also, do this in kvm. kvm can emulate a lot of numa configurations, do
>experiment with those too.
>
>Basically, try to break your "cleanup". Stuff one should do for every
>patch one sends anyway.
Hi, Borislav
I have tried several test combinations of the fake numa. The result shows good.
The test result marked as P (Passed), means the system boots up and simple
kernel build test succeed.
# test matrix and result
## Qemu
With qemu, I have tried [phys_node, emu_node] = [(1, 4), (0, 2, 4, 8)]
+----------------+--------+--------+
| phys_node | 1 | 4 |
|emu_node | | |
+----------------+--------+--------+
| 0 | P | P |
+----------------+--------+--------+
| 2 | P | P |
+----------------+--------+--------+
| 4 | P | P |
+----------------+--------+--------+
| 8 | P | P |
+----------------+--------+--------+
phys_node is emulated with qemu command line:
"-numa node,nodeid=0,cpus=1-2 -numa node,nodeid=1,cpus=3-4 -numa
node,nodeid=2,cpus=0 -numa node,nodeid=2,cpus=5 -numa
node,nodeid=3,cpus=6-7"
emu_node is emulated with kernel command line:
"numa=fake=N"
## Baremetal
On my machine, it only has one numa node, so I could just verify phys_node
with 1.
+----------------+--------+
| phys_node | 1 |
|emu_node | |
+----------------+--------+
| 0 | P |
+----------------+--------+
| 2 | P |
+----------------+--------+
| 4 | P |
+----------------+--------+
| 8 | P |
+----------------+--------+
emu_node is emulated with kernel command line:
"numa=fake=N"
# Other things I observed
Generally, in qemu guest, every thing looks good, while there are two things I
saw in baremetal machine.
At first I want to emphasize, I saw the same behavior with/without my
"cleanup".
## only 3 node when fake=4
[ 0.000000] Faking a node at [mem 0x0000000000000000-0x000000022f5fffff]
[ 0.000000] Faking node 0 at [mem 0x0000000000000000-0x000000007fffffff]
(2048MB)
[ 0.000000] Faking node 1 at [mem 0x0000000080000000-0x0000000133ffffff]
(2880MB)
[ 0.000000] Faking node 2 at [mem 0x0000000134000000-0x000000022f5fffff]
(4022MB)
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000001000-0x000000000009cfff]
[ 0.000000] node 0: [mem 0x0000000000100000-0x000000007fffffff]
[ 0.000000] node 1: [mem 0x0000000080000000-0x00000000ba5b1fff]
[ 0.000000] node 1: [mem 0x00000000ba5b9000-0x00000000bad8dfff]
[ 0.000000] node 1: [mem 0x00000000bafb6000-0x00000000ca8a1fff]
[ 0.000000] node 1: [mem 0x00000000ca93a000-0x00000000ca977fff]
[ 0.000000] node 1: [mem 0x00000000cafff000-0x00000000caffffff]
[ 0.000000] node 1: [mem 0x0000000100000000-0x0000000133ffffff]
[ 0.000000] node 2: [mem 0x0000000134000000-0x000000022f5fffff]
## some warning
I don't see these two warnings without "numa=fake=N".
[ 0.004000] sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
[ 0.004000] ------------[ cut here ]------------
[ 0.004000] WARNING: CPU: 1 PID: 0 at arch/x86/kernel/smpboot.c:424 topology_sane.isra.5+0x6c/0x70
[ 8.594469] sysfs: cannot create duplicate filename '/devices/platform/coretemp.0/hwmon/hwmon2/temp2_label'
[ 8.594478] ------------[ cut here ]------------
[ 8.594482] WARNING: CPU: 4 PID: 34 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x56/0x70
# Some thoughts on the code
After went throught the numa_emulation(), I suggest to restructure the
numa_nodes_parsed based on the emulated nodes, instead of set
numa_nodes_parsed directly in emu_setup_memblk().
Two cases in my mind, which are not friendly:
1. split_nodes_size_interleave/split_nodes_interleave() may fail or the
following procedure may fail.
2. fake node may be less than physcial nodes
Both of them may leads to a inaccurate numa_nodes_parsed. So I have a patch to
restructure it from emulated node info.
Will send it soon.
>
>--
>Regards/Gruss,
> Boris.
>
>Good mailing practices for 400: avoid top-posting and trim the reply.
--
Wei Yang
Help you, Help me
Download attachment "signature.asc" of type "application/pgp-signature" (820 bytes)
Powered by blists - more mailing lists