lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 2 Jul 2013 12:29:18 +0200
From:	Borislav Petkov <bp@...en8.de>
To:	Ingo Molnar <mingo@...nel.org>
Cc:	Wedson Almeida Filho <wedsonaf@...il.com>,
	Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
	linux-kernel@...r.kernel.org,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [PATCH] x86: Use asm-goto to implement mutex fast path on x86-64

On Tue, Jul 02, 2013 at 08:39:12AM +0200, Ingo Molnar wrote:
> Yeah - I didn't know your CPU count, -j64 is what I use.

Right, but the -j make jobs argument - whenever it is higher than the
core count - shouldn't matter too much to the workload because all those
threads remain runnable but simply wait to get a shot to run.

Maybe the overhead of setting up threads which are more than necessary
could be of issue although those measurements didn't show that. It
actually showed that -j64 build is a second faster on the average than
-j(core_count+1).

> Also, just in case it wasn't clear: thanks for the measurements

I thank you guys for listening - it is so much fun playing with this! :)

> - and I'd be in favor of merging this patch if it shows any
> improvement or if measurements lie within noise, because per asm
> review the change should be a win.

Right, so we can say for sure that machine utilization drops a bit:

+ 600,993 context-switches
- 600,078 context-switches

- 3,146,429,834,505 cycles
+ 3,141,378,247,404 cycles

- 2,402,804,186,892 stalled-cycles-frontend
+ 2,398,997,896,542 stalled-cycles-frontend

- 1,844,806,444,182 stalled-cycles-backend
+ 1,841,987,157,784 stalled-cycles-backend

- 1,801,184,009,281 instructions
+ 1,798,363,791,924 instructions

and a couple more.

Considering the simple change, this is clearly a win albeit a small one.

Disadvantages:

- 25,449,932 page-faults
+ 25,450,046 page-faults

- 402,482,696,262 branches
+ 403,257,285,840 branches

- 17,550,736,725 branch-misses
+ 17,552,193,349 branch-misses

It looks to me like this way we're a wee bit less predictable to the
machine but it seems it recovers at some point. Again, considering it
doesn't hurt runtime or some other aspect more gravely, we can accept
them.

The moral of the story: never ever use prerequisite stuff like

echo <N> > .../drop_caches

in the to-be-traced workload because it lies to ya:

$ cat ../build-kernel.sh
#!/bin/bash

make -s clean
echo 1 > /proc/sys/vm/drop_caches

$ perf stat --repeat 10 -a --sync --pre '../build-kernel.sh' make -s -j64 bzImage

 Performance counter stats for 'make -s -j64 bzImage' (10 runs):

     960601.373972 task-clock                #    7.996 CPUs utilized            ( +-  0.19% ) [100.00%]
           601,511 context-switches          #    0.626 K/sec                    ( +-  0.16% ) [100.00%]
            32,780 cpu-migrations            #    0.034 K/sec                    ( +-  0.31% ) [100.00%]
        25,449,646 page-faults               #    0.026 M/sec                    ( +-  0.00% )
 3,142,081,058,378 cycles                    #    3.271 GHz                      ( +-  0.11% ) [83.40%]
 2,401,261,614,189 stalled-cycles-frontend   #   76.42% frontend cycles idle     ( +-  0.08% ) [83.39%]
 1,845,047,843,816 stalled-cycles-backend    #   58.72% backend  cycles idle     ( +-  0.14% ) [66.65%]
 1,797,566,509,722 instructions              #    0.57  insns per cycle
                                             #    1.34  stalled cycles per insn  ( +-  0.10% ) [83.43%]
   403,531,133,058 branches                  #  420.082 M/sec                    ( +-  0.09% ) [83.37%]
    17,562,347,910 branch-misses             #    4.35% of all branches          ( +-  0.10% ) [83.20%]

     120.128371521 seconds time elapsed                                          ( +-  0.19% )


VS


$ cat ../build-kernel.sh
#!/bin/bash

make -s clean
echo 1 > /proc/sys/vm/drop_caches
make -s -j64 bzImage


$ perf stat --repeat 10 -a --sync ../build-kernel.sh

 Performance counter stats for '../build-kernel.sh' (10 runs):

    1032946.552711 task-clock                #    7.996 CPUs utilized            ( +-  0.09% ) [100.00%]
           636,651 context-switches          #    0.616 K/sec                    ( +-  0.13% ) [100.00%]
            37,443 cpu-migrations            #    0.036 K/sec                    ( +-  0.31% ) [100.00%]
        26,005,318 page-faults               #    0.025 M/sec                    ( +-  0.00% )
 3,164,715,146,894 cycles                    #    3.064 GHz                      ( +-  0.10% ) [83.38%]
 2,436,459,399,308 stalled-cycles-frontend   #   76.99% frontend cycles idle     ( +-  0.10% ) [83.35%]
 1,877,644,323,184 stalled-cycles-backend    #   59.33% backend  cycles idle     ( +-  0.20% ) [66.52%]
 1,815,075,000,778 instructions              #    0.57  insns per cycle
                                             #    1.34  stalled cycles per insn  ( +-  0.09% ) [83.19%]
   406,020,700,850 branches                  #  393.070 M/sec                    ( +-  0.07% ) [83.40%]
    17,578,808,228 branch-misses             #    4.33% of all branches          ( +-  0.12% ) [83.35%]

     129.176026516 seconds time elapsed                                          ( +-  0.09% )


-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ