[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150928173434.GE48470@linux.vnet.ibm.com>
Date: Mon, 28 Sep 2015 10:34:34 -0700
From: Nishanth Aravamudan <nacc@...ux.vnet.ibm.com>
To: Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>
Cc: benh@...nel.crashing.org, paulus@...ba.org, mpe@...erman.id.au,
anton@...ba.org, linuxppc-dev@...ts.ozlabs.org,
linux-kernel@...r.kernel.org, cl@...ux.com,
gkurz@...ux.vnet.ibm.com, grant.likely@...aro.org,
nikunj@...ux.vnet.ibm.com, khandual@...ux.vnet.ibm.com
Subject: Re: [PATCH RFC 0/5] powerpc:numa Add serial nid support
On 27.09.2015 [23:59:08 +0530], Raghavendra K T wrote:
> Problem description:
> Powerpc has sparse node numbering, i.e. on a 4 node system nodes are
> numbered (possibly) as 0,1,16,17. At a lower level, we map the chipid
> got from device tree is naturally mapped (directly) to nid.
chipid is a OPAL concept, I believe, and not documented in PAPR... How
does this work under PowerVM?
> Potential side effect of that is:
>
> 1) There are several places in kernel that assumes serial node numbering.
> and memory allocations assume that all the nodes from 0-(highest nid)
> exist inturn ending up allocating memory for the nodes that does not exist.
>
> 2) For virtualization use cases (such as qemu, libvirt, openstack), mapping
> sparse nid of the host system to contiguous nids of guest (numa affinity,
> placement) could be a challenge.
>
> Possible Solutions:
> 1) Handling the memory allocations is kernel case by case: Though in some
> cases it is easy to achieve, some cases may be intrusive/not trivial.
> at the end it does not handle side effect (2) above.
>
> 2) Map the sparse chipid got from device tree to a serial nid at kernel
> level (The idea proposed in this series).
> Pro: It is more natural to handle at kernel level than at lower (OPAL) layer.
> con: The chipid is in device tree no longer the same as nid in kernel
Is there any debugging/logging? Looks like not -- so how does a sysadmin
map from firmware-provided values to the Linux values? That's going to
make debugging of large systems (PowerVM or otherwise) less than
pleasant, it seems? Possibly you could put something in sysfs?
-Nish
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists