linux-kernel - Re: gcc inlining heuristics was Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.2.00.0901121602230.6528@localhost.localdomain>
Date:	Mon, 12 Jan 2009 16:21:20 -0800 (PST)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Bernd Schmidt <bernds_cb1@...nline.de>
cc:	Andi Kleen <andi@...stfloor.org>,
	David Woodhouse <dwmw2@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...e.hu>,
	Harvey Harrison <harvey.harrison@...il.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Chris Mason <chris.mason@...cle.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	paulmck@...ux.vnet.ibm.com, Gregory Haskins <ghaskins@...ell.com>,
	Matthew Wilcox <matthew@....cx>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	linux-btrfs <linux-btrfs@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Nick Piggin <npiggin@...e.de>,
	Peter Morreale <pmorreale@...ell.com>,
	Sven Dietrich <SDietrich@...ell.com>, jh@...e.cz
Subject: Re: gcc inlining heuristics was Re: [PATCH -v7][RFC]: mutex: implement
 adaptive spinning

On Mon, 12 Jan 2009, Bernd Schmidt wrote:
>
> Too lazy to construct one myself, I googled for examples, and here's a 
> trivial one that shows how it affects the ability of the compiler to 
> eliminate memory references:

Do you really think this is realistic or even relevant?

The fact is

 (a) most people use similar types, so your example of "short" vs "int" is 
     actually not very common. Type-based alias analysis is wonderful for 
     finding specific examples of something you can optimize, but it's not 
     actually all that wonderful in general. It _particularly_ isn't 
     wonderful once you start looking at the downsides.

     When you're adding arrays of integers, you're usually adding 
     integers. Not "short"s. The shorts may be a great example of a 
     special case, but it's a special case!

 (b) instructions with memory accesses aren't the problem - instructions 
     that take cache misses are. Your example is an excellent example of 
     that - eliding the simple load out of the loop makes just about 
     absolutely _zero_ difference in any somewhat more realistic scenario, 
     because that one isn't the one that is going to make any real 
     difference anyway.

The thing is, the way to optimize for modern CPU's isn't to worry 
over-much about instruction scheduling. Yes, it matters for the broken 
ones, but it matters in the embedded world where you still find in-order 
CPU's, and there the size of code etc matters even more.

> I'll grant you that if you're writing a kernel or maybe a malloc
> library, you have reason to be unhappy about it.  But that's what
> compiler switches are for: -fno-strict-aliasing allows you to write code
> in a superset of C.

Oh, I'd use that flag regardless yes. But what you didn't seem to react to 
was that gcc - for no valid reason what-so-ever - actually trusts (or at 
least trusted: I haven't looked at that code for years) provably true 
static alias information _less_ than the idiotic weaker type-based one.

You make all this noise about how type-based alias analysis improves code, 
but then you can't seem to just look at the example I gave you. Type-based 
alias analysis didn't improve code. It just made things worse, for no 
actual gain. Moving those accesses to the stack around just causes worse 
behavior, and a bigger stack frame, which causes more cache misses.

[ Again, I do admit that kernel code is "different": we tend to have a 
  cold stack, in ways that many other code sequences do not have. System 
  code tends to get a lot more I$ and D$ misses. Deep call-chains _will_ 
  take cache misses on the stack, simply because the user will do things 
  between system calls or page faults that almost guarantees that things 
  are not in L1, and often not in L2 either.

  Also, sadly, microbenchmarks often hide this, since they are often 
  exactly the unrealistic kinds of back-to-back system calls that almost 
  no real program ever has, since real programs actually _do_ something 
  with the data. ]

My point is, you're making all these arguments and avoiding looking at the 
downsides of what you are arguing for.

So we use -Os - because it generally generates better (and simpler) code. 
We use -fno-strict-alias for the same reason. 

			Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/