linux-kernel - Re: [tip:x86/asm] x86: Pack function addresses tightly as well

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150517070929.GA23344@gmail.com>
Date:	Sun, 17 May 2015 09:09:29 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Andy Lutomirski <luto@...capital.net>,
	Davidlohr Bueso <dave@...olabs.net>,
	Peter Anvin <hpa@...or.com>,
	Denys Vlasenko <dvlasenk@...hat.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Tim Chen <tim.c.chen@...ux.intel.com>,
	Borislav Petkov <bp@...en8.de>,
	Peter Zijlstra <peterz@...radead.org>,
	"Chandramouleeswaran, Aswin" <aswin@...com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Brian Gerst <brgerst@...il.com>,
	Paul McKenney <paulmck@...ux.vnet.ibm.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Jason Low <jason.low2@...com>,
	"linux-tip-commits@...r.kernel.org" 
	<linux-tip-commits@...r.kernel.org>
Subject: Re: [tip:x86/asm] x86: Pack function addresses tightly as well


* Ingo Molnar <mingo@...nel.org> wrote:

> > Put another way: I suspect this is more likely to hurt, and less 
> > likely to help than the others.
> 
> Yeah, indeed.
> 
> So my thinking was that it would help, because:
> 
>   - There's often locality of reference between functions: we often
>     have a handful of hot functions that are sitting next to each 
>     other and they could thus be packed closer to each other this way, 
>     creating a smaller net I$ footprint.
> 
>   - We have a handful of 'clusters' or small and often hot functions,
>     especially in the locking code:
> 
> 	ffffffff81893080 T _raw_spin_unlock_irqrestore
> 	ffffffff81893090 T _raw_read_unlock_irqrestore
> 	ffffffff818930a0 T _raw_write_unlock_irqrestore
> 	ffffffff818930b0 T _raw_spin_trylock_bh
> 	ffffffff81893110 T _raw_spin_unlock_bh
> 	ffffffff81893130 T _raw_read_unlock_bh
> 	ffffffff81893150 T _raw_write_unlock_bh
> 	ffffffff81893170 T _raw_read_trylock
> 	ffffffff818931a0 T _raw_write_trylock
> 	ffffffff818931d0 T _raw_read_lock_irqsave
> 	ffffffff81893200 T _raw_write_lock_irqsave
> 	ffffffff81893230 T _raw_spin_lock_bh
> 	ffffffff81893270 T _raw_spin_lock_irqsave
> 	ffffffff818932c0 T _raw_write_lock
> 	ffffffff818932e0 T _raw_write_lock_irq
> 	ffffffff81893310 T _raw_write_lock_bh
> 	ffffffff81893340 T _raw_spin_trylock
> 	ffffffff81893380 T _raw_read_lock
> 	ffffffff818933a0 T _raw_read_lock_irq
> 	ffffffff818933c0 T _raw_read_lock_bh
> 	ffffffff818933f0 T _raw_spin_lock
> 	ffffffff81893430 T _raw_spin_lock_irq
> 	ffffffff81893450
> 
>      That's 976 bytes total if 16 bytes aligned.
> 
>      With function packing, they compress into:
> 
> 	ffffffff817f2458 T _raw_spin_unlock_irqrestore
> 	ffffffff817f2463 T _raw_read_unlock_irqrestore
> 	ffffffff817f2472 T _raw_write_unlock_irqrestore
> 	ffffffff817f247d T _raw_read_unlock_bh
> 	ffffffff817f2498 T _raw_write_unlock_bh
> 	ffffffff817f24af T _raw_spin_unlock_bh
> 	ffffffff817f24c6 T _raw_read_trylock
> 	ffffffff817f24ef T _raw_write_trylock
> 	ffffffff817f250e T _raw_spin_lock_bh
> 	ffffffff817f2536 T _raw_read_lock_irqsave
> 	ffffffff817f255e T _raw_write_lock_irqsave
> 	ffffffff817f2588 T _raw_spin_lock_irqsave
> 	ffffffff817f25be T _raw_spin_trylock_bh
> 	ffffffff817f25f6 T _raw_spin_trylock
> 	ffffffff817f2615 T _raw_spin_lock
> 	ffffffff817f2632 T _raw_spin_lock_irq
> 	ffffffff817f2650 T _raw_write_lock
> 	ffffffff817f266b T _raw_write_lock_irq
> 	ffffffff817f2687 T _raw_write_lock_bh
> 	ffffffff817f26ad T _raw_read_lock
> 	ffffffff817f26c6 T _raw_read_lock_bh
> 	ffffffff817f26ea T _raw_read_lock_irq
> 	ffffffff817f2704
> 
>       That's 684 bytes - a very stark difference that will show up in 
>       better I$ footprint even if usage is sparse.
> 
>       OTOH, on the flip side, their ordering is far from ideal, so for 
>       example the rarely used 'trylock' variants are mixed into the 
>       middle, and the way we mix rwlock with spinlock ops isn't 
>       very pretty either.
> 
>       So we could reduce alignment for just the locking APIs, via per 
>       .o cflags in the Makefile, if packing otherwise hurts the common 
>       case.

Another datapoint: despite the widespread stereotype of bloated, 
kernel stack guzzling kernel functions, most kernel functions are 
pretty small and fit into a single (or at most two) cachelines.

Here's a histogram of x86-64 defconfig function sizes (in bytes, cut 
off at around 1200 bytes):

  600 ++----------------+------------------+-----------------+-----------------+------------------+----------------++
      +                 +                  +                 +               kernel function sizes histogram   A    +
      |                                                                                                             |
      |                                                                                                             |
      |                                                                                                             |
      |                                                                                                             |
  500 +AA                                                                                                          ++
      |                                                                                                             |
      A                                                                                                             |
      |                                                                                                             |
      |                                                                                                             |
      |                                                                                                             |
  400 ++                                                                                                           ++
      |                                                                                                             |
      |A                                                                                                            |
      |                                                                                                             |
      |                                                                                                             |
  300 +AAA                                                                                                         ++
      | A                                                                                                           |
      |A                                                                                                            |
      |A                                                                                                            |
      |                                                                                                             |
      | AAA                                                                                                         |
  200 +AAAA                                                                                                        ++
      | AAAAA                                                                                                       |
      | AAAAAA                                                                                                      |
      | A AAAA A                                                                                                    |
      |A A AAAAAAA                                                                                                  |
      |       AAAAA                                                                                                 |
  100 +A        AAAA                                                                                               ++
      |         AAAAAAA                                                                                             |
      |            AAAAAAAAAA                                                                                       |
      |               AAAAAAAAAAAA                                                                                  |
      |A                AAAAAAAAAAAA AAAA                                                                           |
      +                 +       AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AA A A A       +               A  +                 +
    0 AA----------------+------------------AA-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA--------------++
      0                200                400               600               800                1000              1200


Zoomed to just the first 200 bytes:

  600 ++--------------------------+--------------------------+---------------------------+-------------------------++
      +                           +                          +               kernel function sizes histogram   A    +
      |                                                                                                             |
      |                                                                                                             |
      |                                                                                                             |
      |                                                                                                             |
  500 ++    A   A                                                                                                  ++
      |                                                                                                             |
      A                                                                                                             |
      |                                                                                                             |
      |                                                                                                             |
      |                                                                                                             |
  400 ++                                                                                                           ++
      |                                                                                                             |
      |      A                                                                                                      |
      |                                                                                                             |
      |                                                                                                             |
  300 ++       A  A     A                                                                                          ++
      |         A                                                                                                   |
      |  A                                                                                                          |
      |   A                                                                                                         |
      |                                                                                                             |
      |             A    A  A A                                                                                     |
  200 ++      A    A   A AAA A                                                                                     ++
      |          A   A    AA   A  A A     A                                                                         |
      |        A  A A AA      AAAA A A   A        A                                                                 |
      |            A         A  A  AA AAAA AAAAAA         A AA                                                      |
      |       A         A         A  A  A A  A A A AAAA A A   AAA AAA A                                             |
      |                                             AA A A AA AAAA   AAAAA AA                                       |
  100 ++     A                                                   A A A    AA  AAAAA AA A                           ++
      |                                                             A    AA AA    AA AAA AAAAA AA A  A              |
      |                                                                               A AA AAAA AAAAAAAAA AAAAAA A A|
      |                                                                                               A AAAAA  AA AAA
      |    AA                                                                                                    A  |
      +                           +                          +                           +                          +
    0 +AAAA-----------------------+--------------------------+---------------------------+-------------------------++
      0                           50                        100                         150                        200


The median function size is around 1 cacheline (64-byte one), ~80% 
fitting into two cachelines, with a big peak for very small functions 
that make up something like 20% of all functions - so packing makes 
sense unless the actual functions being executed are more fragmented 
than I'd estimate around 50%.

Which is still a significant hurdle.

Plus there are a truckload of other details, like how exactly the flow 
of execution goes within the function (which part is hot). For example 
locking APIs tend to return early after a relatively straight path of 
execution, making the effective hot function length even smaller than 
the median.

All in one: there's no clear winner, so there's no way around actually 
measuring it ... ;-)

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/