lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 16 Feb 2022 14:35:10 +0100
From:   Denys Vlasenko <dvlasenk@...hat.com>
To:     Feng Tang <feng.tang@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...el.com>,
        H Peter Anvin <hpa@...or.com>,
        Peter Zijlstra <peterz@...radead.org>, x86@...nel.org,
        linux-kernel@...r.kernel.org
Cc:     Josh Poimboeuf <jpoimboe@...hat.com>
Subject: Re: [PATCH] x86, vmlinux.lds: Add debug option to force all data
 sections aligned

On 2/16/22 9:28 AM, Feng Tang wrote:
> 0day has reported many strange performance changes (regression or
> improvement), in which there was no obvious relation between the culprit
> commit and the benchmark at the first look, and it causes people to doubt
> the test itself is wrong.
> 
> Upon further check, many of these cases are caused by the change to the
> alignment of kernel text or data, as whole text/data of kernel are linked
> together, change in one domain can affect alignments of other domains.
> 
> To help quickly identifying if the strange performance change is caused
> by _data_ alignment, add a debug option to force the data sections from
> all .o files aligned on THREAD_SIZE, so that change in one domain won't
> affect other modules' data alignment.
> 
> We have used this option to check some strange kernel changes [1][2][3],
> and those performance changes were gone after enabling it, which proved
> they are data alignment related. Besides these publicly reported cases,
> recently there are other similar cases found by 0day, and this option
> has been actively used by 0Day for analyzing strange performance changes.
...
> +	.data : AT(ADDR(.data) - LOAD_OFFSET)
> +#ifdef CONFIG_DEBUG_FORCE_DATA_SECTION_ALIGNED
> +	/* Use the biggest alignment of below sections */
> +	SUBALIGN(THREAD_SIZE)
> +#endif

"Align every input section to 4096 bytes" ?

This is way, way, WAY too much. The added padding will be very wasteful.

Performance differences are likely to be caused by cacheline alignment.
Factoring in an odd hardware prefetcher grabbing an additional
cacheline after every accessed one, I'd say alignment to 128 bytes
(on x86) should suffice for almost any scenario. Even 64 bytes
would almost always work fine.

The hardware prefetcher grabbing an additional cacheline was seen
adversely affecting locking performance in a structure - developers
thought two locks are not in the same cacheline, but because of
this "optimization" they effectively are, and thus they bounce
between CPUs. (1) Linker script can't help with this, since it was
struct layout issue, not section alignment issue.
(2) This "optimization" (unconditional fetch of next cacheline)
might be bad enough to warrant detecting and disabling on boot.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ