[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <512D7FAD.1040003@jp.fujitsu.com>
Date: Wed, 27 Feb 2013 12:38:21 +0900
From: Yasuaki Ishimatsu <isimatu.yasuaki@...fujitsu.com>
To: Yinghai Lu <yinghai@...nel.org>
CC: Don Morris <don.morris@...com>, "H. Peter Anvin" <hpa@...or.com>,
Tejun Heo <tj@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Tony Luck <tony.luck@...el.com>,
Thomas Renninger <trenn@...e.de>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Tim Gardner <tim.gardner@...onical.com>,
<linux-kernel@...r.kernel.org>, <tglx@...utronix.de>,
<mingo@...hat.com>, <x86@...nel.org>, <a.p.zijlstra@...llo.nl>,
<jarkko.sakkinen@...el.com>, <tangchen@...fujitsu.com>
Subject: Re: sched: CPU #1's llc-sibling CPU #0 is not on the same node!
2013/02/27 11:30, Yinghai Lu wrote:
> On Tue, Feb 26, 2013 at 4:52 PM, Yasuaki Ishimatsu
> <isimatu.yasuaki@...fujitsu.com> wrote:
>> 2013/02/27 7:44, Yinghai Lu wrote:
>>>>>
>>>>> that commit is totally broken, and it should be reverted.
>>>>>
>>>>> 1. numa_init is called several times, NOT just for srat. so those
>>>>> nodes_clear(numa_nodes_parsed)
>>>>> memset(&numa_meminfo, 0, sizeof(numa_meminfo))
>>>>> can not be just removed.
>>>>> please consider sequence is: numaq, srat, amd, dummy.
>>>>> You need to make fall back path working!
>>>>>
>>>>> 2. simply split acpi_numa_init to early_parse_srat.
>>>>> a. that early_parse_srat is NOT called for ia64, so you break ia64.
>>>>> b. for (i = 0; i < MAX_LOCAL_APIC; i++)
>>>>> set_apicid_to_node(i, NUMA_NO_NODE)
>>>>> still left in numa_init. So it will just clear result from
>>>>> early_parse_srat.
>>>>> it should be moved before that....
>>>>
>>>>
>>>> c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
>>>> early before override from INITRD is settled.
>>>>
>>>>>
>>>>> 3. that patch TITLE is total misleading, there is NO x86 in the title,
>>>>> but it changes
>>>>> to x86 code.
>>>>>
>>>>> 4, it does not CC to TJ and other numa guys...
>>>
>>>
>>> After looked at the code more, thought that theory that does not let
>>> kernel use ram
>>> on hotplug area is not right.
>>>
>>
>>> after that commit, following range can not use movable ram:
>>> 1. real_mode code.... well..funny, legacy cpu0 [0,1M) could be
>>> hot-removed?
>>> 2. dma_continguous ?
>>> 3. log buff ring.
>>> 4. initrd... why it will be freed after booting, so it could be on
>>> movable...
>>> 5. crashkernel for kdump...: : looks like we can not put kdump kernel
>>> above 4G anymore
>>> 6. initmem_init: it will allocate page table to setup kernel mapping
>>> for memory..., it should
>>> be with BRK and near end of max_pfn....
>>
>>
>> If you use "movablemem_map=srat", abobe memory can not use movable memory.
>> But in my understanding, current Linux cannot move above memory. So above
>> memory should not use movable memory.
>>
>
> that depends, like relocating initrd to different position.
>
>>
>>>
>>> If node is hotplugable, the mem related stuff like page table and
>>> vmemmap could be
>>> on the that node without problem and should be on that node.
>>>
>>
>>> assume first cpu only have 1G ram, and other 31 socket will have bunch of
>>> ram
>>> and those cpu with ram could be hotadd and hotremoved.
>>> Now you want to put page table and vmemmap on first node.
>>> The system would not boot as not enough memory for cover whole system RAM.
>>
>>
>> Even if we solve your above mentions, the system cannot boot.
>> In this case, user should:
>> o add ram to first cpu
>> o decreases hotpluggable ram by :
>> - changing hotpluggable information of SRAT
>> - using movablemem_map=nn[KMG]@ss[KMG]
>
> Do you mean you can not boot one socket system with 1G ram ?
> Assume socket 0 does not support hotplug, other 31 sockets support hot plug.
>
> So we could boot system only with socket0, and later one by one hot
> add other cpus.
In this case, system can boot. But other cpus with bunch of ram hot
plug may fails, since system does not have enough memory for cover
hot added memory. When hot adding memory device, kernel object for the
memory is allocated from 1G ram since hot added memory has not been
enabled.
Thanks,
Yasuaki Ishimatsu
>
> We should simulate that way, just like boot system with PXM0 at first
> and later during acpi scan, add other cpus/ram.
>
> Thanks
>
> Yinghai
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists