lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 12 Oct 2015 18:21:01 +0800
From:	Daniel J Blueman <daniel@...ascale.com>
To:	Jiang Liu <jiang.liu@...ux.intel.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Denys Vlasenko <dvlasenk@...hat.com>,
	Ingo Molnar <mingo@...nel.org>
CC:	Len Brown <len.brown@...el.com>, <x86@...nel.org>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 3/3] x86/apic: Use smaller array for __apicid_to_node[]
 mapping

On Fri, Oct 9, 2015 at 11:35 PM, Jiang Liu <jiang.liu@...ux.intel.com> 
wrote:
> On 2015/10/3 3:12, Denys Vlasenko wrote:
>>  From: Daniel J Blueman <daniel@...ascale.com>
>> 
>>  The Intel x2APIC spec states the upper 16-bits of APIC ID is the
>>  cluster ID [1, p2-12], intended for future distributed systems. 
>> Beyond
>>  the legacy 8-bit APIC ID, Numascale NumaConnect uses 4-bits for the
>>  position of a server on each axis of a multi-dimension torus; SGI
>>  NUMAlink also structures the APIC ID space.
>> 
>>  Instead, define an array based on NR_CPUs to achieve a 1:1 mapping 
>> and
>>  perform linear search; this addresses the binary bloat and the 
>> present
>>  artificial APIC ID limits. With CONFIG_NR_CPUS=256:
>> 
>>  $ size vmlinux vmlinux-patched
>>    text      data     bss      dec     hex filename
>>  18232877 1849656 2281472 22364005 1553f65 vmlinux
>>  18233034 1786168 2281472 22300674 1544802 vmlinux-patched
>> 
>>  That is, ~64 kbytes less data.
>> 
>>  Works peachy on a 256-core system with a 20-bit APIC ID space, and 
>> on a
>>  48-core legacy 8-bit APIC ID system. If we care, I can make
>>  numa_cpu_node O(1) lookup for typical cases.
>> 
>>  Signed-off-by: Daniel J Blueman <daniel@...ascale.com>
>>  CC: Ingo Molnar <mingo@...nel.org>
>>  CC: Daniel J Blueman <daniel@...ascale.com>
>>  CC: Jiang Liu <jiang.liu@...ux.intel.com>
>>  CC: Thomas Gleixner <tglx@...utronix.de>
>>  CC: Len Brown <len.brown@...el.com>
>>  CC: x86@...nel.org
>>  CC: linux-kernel@...r.kernel.org
>> 
>>  [1]
>>  
>> http://www.intel.com/content/dam/doc/specification-update/64-architecture-x2apic-specification.pdf
>>  ---
>> 
>>  I added forgotten change in arch/x86/mm/numa_emulation.c (Denys)
>> 
>>   arch/x86/include/asm/numa.h  | 13 +++++++------
>>   arch/x86/kernel/cpu/amd.c    |  8 ++++----
>>   arch/x86/mm/numa.c           | 31 +++++++++++++++++++++++--------
>>   arch/x86/mm/numa_emulation.c |  6 +++---
>>   4 files changed, 37 insertions(+), 21 deletions(-)
>> 
>>  diff --git a/arch/x86/include/asm/numa.h 
>> b/arch/x86/include/asm/numa.h
>>  index c2ecfd0..33becb8 100644
>>  --- a/arch/x86/include/asm/numa.h
>>  +++ b/arch/x86/include/asm/numa.h
>>  @@ -17,6 +17,11 @@
>>    */
>>   #define NODE_MIN_SIZE (4*1024*1024)
>> 
>>  +struct apicid_to_node {
>>  +	int apicid;
>>  +	s16 node;
>>  +};
>>  +
>>   extern int numa_off;
>> 
>>   /*
>>  @@ -27,17 +32,13 @@ extern int numa_off;
>>    * should be accessed by the accessors - set_apicid_to_node() and
>>    * numa_cpu_node().
>>    */
>>  -extern s16 __apicid_to_node[MAX_LOCAL_APICID];
>>  +extern struct apicid_to_node __apicid_to_node[NR_CPUS];
> Hi Denys and Daniel,
> 	I still have some concerns about limiting the array to NR_CPUS.
> __apicid_to_node are populated according to the order that CPUs are
> listed in ACPI SRAT table. And CPU IDs are allocated according to the
> order that CPUs are listed in ACPI MADT(APIC) order. So it may cause
> trouble if:
> 1) system has more than NR_CPUS CPUs
> 2) CPUs are listed in different order in SRAT and MADT tables.

Another approach which may be suitable without changing SRAT parsing to 
be after the memory allocator is up, is to exploit the associativity of 
the bottom APIC ID bits.

We'd have a searchable static array based on NUMA_SHIFT and use the 
bit-shift encoded in the MSRs. That said, this may run into the issue 
Jiang cited albeit with CONFIG_NUMA_SHIFT. Perhaps the constraints or 
risk of restructuring SRAT parsing aren't worth the payoff?

Finally, the only alternative is as the current mapping is initialised 
in numa_init, we can drop the static initialisation and move the 64KB 
to the BSS to avoid bloating the binary image, but this may not achieve 
the initial goal of runtime footprint reduction.

Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ