lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 6 Feb 2020 16:27:46 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Jann Horn' <jannh@...gle.com>, Kees Cook <keescook@...omium.org>
CC:     Kristen Carlson Accardi <kristen@...ux.intel.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H . Peter Anvin" <hpa@...or.com>,
        Arjan van de Ven <arjan@...ux.intel.com>,
        Rick Edgecombe <rick.p.edgecombe@...el.com>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        kernel list <linux-kernel@...r.kernel.org>,
        Kernel Hardening <kernel-hardening@...ts.openwall.com>
Subject: RE: [RFC PATCH 06/11] x86: make sure _etext includes function
 sections

From: Jann Horn
> Sent: 06 February 2020 13:16
...
> > I cannot find evidence for
> > what function start alignment should be.
> 
> There is no architecturally required alignment for functions, but
> Intel's Optimization Manual
> (<https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-
> optimization-manual.pdf>)
> recommends in section 3.4.1.5, "Code Alignment":
> 
> | Assembly/Compiler Coding Rule 12. (M impact, H generality)
> | All branch targets should be 16-byte aligned.
> 
> AFAIK this is recommended because, as documented in section 2.3.2.1,
> "Legacy Decode Pipeline" (describing the frontend of Sandy Bridge, and
> used as the base for newer microarchitectures):
> 
> | An instruction fetch is a 16-byte aligned lookup through the ITLB
> and into the instruction cache.
> | The instruction cache can deliver every cycle 16 bytes to the
> instruction pre-decoder.
> 
> AFAIK this means that if a branch ends close to the end of a 16-byte
> block, the frontend is less efficient because it may have to run two
> instruction fetches before the first instruction can even be decoded.

See also The microarchitecture of Intel, AMD and VIA CPUs from www.agner.org/optimize 

My suspicion is that reducing the cache size (so more code fits in)
will almost always be a win over aligning branch targets and entry points.
If the alignment of a function matters then there are probably other
changes to that bit of code that will give a larger benefit.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ