linux-kernel - Re: [PATCH] x86: Pack loops tightly as well

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150410134607.GF28074@pd.tnic>
Date:	Fri, 10 Apr 2015 15:46:07 +0200
From:	Borislav Petkov <bp@...en8.de>
To:	Ingo Molnar <mingo@...nel.org>
Cc:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Jason Low <jason.low2@...com>,
	Peter Zijlstra <peterz@...radead.org>,
	Davidlohr Bueso <dave@...olabs.net>,
	Tim Chen <tim.c.chen@...ux.intel.com>,
	Aswin Chandramouleeswaran <aswin@...com>,
	LKML <linux-kernel@...r.kernel.org>,
	Andy Lutomirski <luto@...capital.net>,
	Denys Vlasenko <dvlasenk@...hat.com>,
	Brian Gerst <brgerst@...il.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [PATCH] x86: Pack loops tightly as well

On Fri, Apr 10, 2015 at 02:30:18PM +0200, Ingo Molnar wrote:
> And the final patch below also packs loops tightly:
> 
>      text        data    bss     dec              filename
>  12566391        1617840 1089536 15273767         vmlinux.align.16-byte
>  12224951        1617840 1089536 14932327         vmlinux.align.1-byte
>  11976567        1617840 1089536 14683943         vmlinux.align.1-byte.funcs-1-byte
>  11903735        1617840 1089536 14611111         vmlinux.align.1-byte.funcs-1-byte.loops-1-byte
> 
> The total reduction is 5.5%.
> 
> Now loop alignment is beneficial if:
> 
>  - a loop is cache-hot and its surroundings are not.
> 
> Loop alignment is harmful if:
> 
>  - a loop is cache-cold
>  - a loop's surroundings are cache-hot as well
>  - two cache-hot loops are close to each other
> 
> and I'd argue that the latter three harmful scenarios are much more 
> common in the kernel. Similar arguments can be made for function 
> alignment as well. (Jump target alignment is a bit different but I 
> think the same conclusion holds.)

So I IMHO think the loop alignment is coupled to the fetch window size
and alignment. I'm looking at the AMD opt. manuals and both for fam 0x15
and 0x16 say that hot loops should be 32-byte aligned due to 32-byte
aligned fetch window in each cycle.

So if we have hot loops, we probably want them 32-byte aligned (I don't
know what that number on Intel is, need to look).

Family 0x16 says, in addition, that if you have branches in those loops,
the first two branches in a cacheline can be processed in a cycle when
they're in the branch predictor. And so to guarantee that you should
align your loop start to a cacheline.

And this all depends on the uarch so I can imagine optimizing for the
one would harm the other.

Looks like a long project of experimenting and running perf counters :-)

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/