linux-kernel - RE: [PATCH 2/2] Call percpu smp cacheline algin interface

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <A29F7A5700A7EA4380E08C1F68C3B6D802B5F702@scsmsx411.amr.corp.intel.com>
Date:	Wed, 9 May 2007 15:16:11 -0700
From:	"Yu, Fenghua" <fenghua.yu@...el.com>
To:	"Andrew Morton" <akpm@...ux-foundation.org>
CC:	"Siddha, Suresh B" <suresh.b.siddha@...el.com>,
	<linux-kernel@...r.kernel.org>, <clameter@....com>
Subject: RE: [PATCH 2/2] Call percpu smp cacheline algin interface

>hm, DEFINE_PER_CPU_SHARED_CACHELINE_ALIGNED is a bit of a mouthful.

>I wonder if we can improve things here so that we use the
runtime-detected
>cacheline size rather than the compile-time size.  I guess not, given
that
>the offsets into the percpu area are calculated at build-time.

>Did you work out how much space this change will actually save?  It
>should be available by suitable crunching on the nm and objdump output.

Depending on how data fields are arranged by linker, the patches could
save or waste per_cpu size. Below is data I got.

Case 1: On linux-2.6.21-rc7-mm2 with defconfig build.
Case 2: On linux-2.6.21-rc7-mm2 plus the patches in this thread with
defconfig build.
Case 3: On linux-2.6.21-rc7-mm2 with defconfig with VSMP=y build.
Case 4: On linux-2.6.21-rc7-mm2 plus the patches in this thread with
defconfig with VSMP=y build.

Please note that on x86/x86-64, per_cpu_init_tss is placed in the first
place in per_cpu section in Case 1 and 3. And thus there is no padding
waste for per_cpu_init_tss in Case 1 and 3.

On X86:
Case 1: Size of per_cpu section is: 0x7768
Case 2: Size of per_cpu section is: 0x790c
The patches waste 0x1a4 bytes.

per_cpu__init_tss, per_cpu__irq_stat, and per_cpu__runqueues are moved
to shared_cacheline_aligned section.

On X86-64:
Case 1: Size of per_cpu section is: 0x72d0
Case 2: Size of per_cpu section is: 0x6540
The patches save 0xd90 bytes.

Case 3: Size of per_cpu section is: 0x72d0
Case 4: Size of per_cpu section is: 0x8340
The patches waste 0x1070 bytes. 

Shall we not use shared_cacheline_aligned section for VSMP case? The
waste of cache eventually may offset the potential gain of alignment.

Probably need to set up a cache line size threshold: if L1 cache line
size is bigger than a number CACHELINE_ALIGN_SHRESHOLD, don't do
cacheline alignment.

per_cpu__init_tss and per_cpu__runqueues are moved to
shared_cacheline_aligned section.

On ia64:
Case 1: Size of per_cpu section is: 0x8370
Case 2: Size of per_cpu section is: 0x7fc0
The patches save 0x3b0 bytes.

per_cpu_ipi_operation and per_cpu_runqueues are moved to
shared_cacheline_aligned section
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/