[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150410123017.GB19918@gmail.com>
Date: Fri, 10 Apr 2015 14:30:18 +0200
From: Ingo Molnar <mingo@...nel.org>
To: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Jason Low <jason.low2@...com>,
Peter Zijlstra <peterz@...radead.org>,
Davidlohr Bueso <dave@...olabs.net>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Aswin Chandramouleeswaran <aswin@...com>,
LKML <linux-kernel@...r.kernel.org>,
Borislav Petkov <bp@...en8.de>,
Andy Lutomirski <luto@...capital.net>,
Denys Vlasenko <dvlasenk@...hat.com>,
Brian Gerst <brgerst@...il.com>,
"H. Peter Anvin" <hpa@...or.com>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: [PATCH] x86: Pack loops tightly as well
* Ingo Molnar <mingo@...nel.org> wrote:
> > I realize that x86 CPU manufacturers recommend 16-byte jump target
> > alignments (it's in the Intel optimization manual), but the cost
> > of that is very significant:
> >
> > text data bss dec filename
> > 12566391 1617840 1089536 15273767 vmlinux.align.16-byte
> > 12224951 1617840 1089536 14932327 vmlinux.align.1-byte
> >
> > By using 1 byte jump target alignment (i.e. no alignment at all)
> > we get an almost 3% reduction in kernel size (!) - and a probably
> > similar reduction in I$ footprint.
>
> Likewise we could pack functions tightly as well via the patch
> below:
>
> text data bss dec filename
> 12566391 1617840 1089536 15273767 vmlinux.align.16-byte
> 12224951 1617840 1089536 14932327 vmlinux.align.1-byte
> 11976567 1617840 1089536 14683943 vmlinux.align.1-byte.funcs-1-byte
>
> Which brings another 2% reduction in the kernel's code size.
>
> It would be interesting to see some benchmarks with these two
> patches applied. Only lightly tested.
And the final patch below also packs loops tightly:
text data bss dec filename
12566391 1617840 1089536 15273767 vmlinux.align.16-byte
12224951 1617840 1089536 14932327 vmlinux.align.1-byte
11976567 1617840 1089536 14683943 vmlinux.align.1-byte.funcs-1-byte
11903735 1617840 1089536 14611111 vmlinux.align.1-byte.funcs-1-byte.loops-1-byte
The total reduction is 5.5%.
Now loop alignment is beneficial if:
- a loop is cache-hot and its surroundings are not.
Loop alignment is harmful if:
- a loop is cache-cold
- a loop's surroundings are cache-hot as well
- two cache-hot loops are close to each other
and I'd argue that the latter three harmful scenarios are much more
common in the kernel. Similar arguments can be made for function
alignment as well. (Jump target alignment is a bit different but I
think the same conclusion holds.)
(I might have missed some CPU microarchitectural details though that
would make such packing undesirable.)
Thanks,
Ingo
=============================>
>From cfc2ca24908cce66b9df1f711225d461f5d59b97 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@...nel.org>
Date: Fri, 10 Apr 2015 14:20:30 +0200
Subject: [PATCH] x86: Pack loops tightly as well
Not-Signed-off-by: Ingo Molnar <mingo@...nel.org>
---
arch/x86/Makefile | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 573d0c459f99..10989a73b986 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -83,6 +83,9 @@ else
# Pack functions tightly as well:
KBUILD_CFLAGS += -falign-functions=1
+ # Pack loops tightly as well:
+ KBUILD_CFLAGS += -falign-loops=1
+
# Don't autogenerate traditional x87 instructions
KBUILD_CFLAGS += $(call cc-option,-mno-80387)
KBUILD_CFLAGS += $(call cc-option,-mno-fp-ret-in-387)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists