[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4D6EB856.1010004@kernel.org>
Date: Wed, 02 Mar 2011 13:36:22 -0800
From: Yinghai Lu <yinghai@...nel.org>
To: David Rientjes <rientjes@...gle.com>
CC: Tejun Heo <tj@...nel.org>, Ingo Molnar <mingo@...e.hu>,
tglx@...utronix.de, "H. Peter Anvin" <hpa@...or.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH x86/mm UPDATED] x86-64, NUMA: Fix distance table handling
On 03/02/2011 01:12 PM, Yinghai Lu wrote:
> On 03/02/2011 07:42 AM, Tejun Heo wrote:
>> Hey,
>>
>> On Wed, Mar 02, 2011 at 06:30:59AM -0800, David Rientjes wrote:
>>> Acked-by: David Rientjes <rientjes@...gle.com>
>>>
>>> There's also this in numa_emulation() that isn't a safe assumption:
>>>
>>> /* make sure all emulated nodes are mapped to a physical node */
>>> for (i = 0; i < ARRAY_SIZE(emu_nid_to_phys); i++)
>>> if (emu_nid_to_phys[i] == NUMA_NO_NODE)
>>> emu_nid_to_phys[i] = 0;
>>>
>>> Node id 0 is not always online depending on how you setup your SRAT. I'm
>>> not sure why emu_nid_to_phys[] would ever map a fake node id that doesn't
>>> exist to a physical node id rather than NUMA_NO_NODE, so I think it can
>>> just be removed. Otherwise, it should be mapped to a physical node id
>>> that is known to be online.
>>
>> Unless I screwed up, that behavior isn't new. It just put in a
>> different form. Looking through the code... Okay, I think node 0
>> always exists. SRAT PXM isn't used as node number directly. It goes
>> through acpi_map_pxm_to_node() which allocates nids from 0 up.
>> amdtopology also guarantees the existence of node 0, so I think we're
>> in the safe and that probably is the reason why we had the above
>> behavior in the first place.
>>
>> IIRC, there are other places which assume the existence of node 0.
>> Whether it's a good idea or not, I'm not sure but requring node 0 to
>> be always allocated doesn't sound too wrong to me. Maybe we can add
>> BUG_ON() if node 0 is offline somewhere.
>
>
> When first socket does not have memory, we will not node 0 online.
> and cpu_to_node() will have those cpus round to near node like node1 or node7.
>
> BTW: this conf get broken several times, and get fixed several times.
david,
it looks like numa emu does not support that conf already.
old code:
void __cpuinit numa_add_cpu(int cpu)
{
unsigned long addr;
u16 apicid;
int physnid;
int nid = NUMA_NO_NODE;
apicid = early_per_cpu(x86_cpu_to_apicid, cpu);
if (apicid != BAD_APICID)
nid = apicid_to_node[apicid];
if (nid == NUMA_NO_NODE)
nid = early_cpu_to_node(cpu);
BUG_ON(nid == NUMA_NO_NODE || !node_online(nid));
current code:
void __cpuinit numa_add_cpu(int cpu)
{
int physnid, nid;
nid = numa_cpu_node(cpu);
if (nid == NUMA_NO_NODE)
nid = early_cpu_to_node(cpu);
BUG_ON(nid == NUMA_NO_NODE || !node_online(nid));
physnid = emu_nid_to_phys[nid];
/*
* Map the cpu to each emulated node that is allocated on the physical
* node of the cpu's apic id.
*/
for_each_online_node(nid)
if (emu_nid_to_phys[nid] == physnid)
cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
}
please note numa_cpu_node or old code will return nid that is node 0, and even node0 does not mem and not onlined.
maybe we can just change to nid = cpu_to_node() to get nodeid that is onlined.
Thanks
Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists