[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <50fcfba8-fc16-b4a1-d117-24ebbe959c0c@virtuozzo.com>
Date: Tue, 9 Jan 2018 19:47:05 +0300
From: Andrey Ryabinin <aryabinin@...tuozzo.com>
To: Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>
Cc: linux-kernel@...r.kernel.org, Kees Cook <keescook@...omium.org>,
Eryu Guan <eguan@...hat.com>,
Alexander Potapenko <glider@...gle.com>,
Chris Metcalf <metcalf@...m.mit.edu>,
David Laight <David.Laight@...LAB.COM>,
Dmitry Vyukov <dvyukov@...gle.com>, stable@...r.kernel.org
Subject: Re: [PATCH] lib/strscpy: remove word-at-a-time optimization.
Attached user space program I used to see the difference.
Usage:
gcc -02 -o strscpy strscpy_test.c
./strscpy {b|w} src_str_len count
src_str_len - length of source string in between 1-4096
count - how many strscpy() to execute.
Also I've noticed something strange. I'm not sure why, but certain
src_len values (e.g. 30) drives branch predictor crazy causing worse than usual results
for byte-at-a-time copy:
$ perf stat ./strscpy b 29 10000000
Performance counter stats for './strscpy b 29 10000000':
165.354974 task-clock:u (msec) # 0.999 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
48 page-faults:u # 0.290 K/sec
640,475,981 cycles:u # 3.873 GHz
2,500,090,080 instructions:u # 3.90 insn per cycle
640,017,126 branches:u # 3870.565 M/sec
1,589 branch-misses:u # 0.00% of all branches
0.165568346 seconds time elapsed
Performance counter stats for './strscpy b 30 10000000':
250.835659 task-clock:u (msec) # 0.999 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
46 page-faults:u # 0.183 K/sec
974,528,780 cycles:u # 3.885 GHz
2,580,090,165 instructions:u # 2.65 insn per cycle
660,017,211 branches:u # 2631.273 M/sec
14,488,234 branch-misses:u # 2.20% of all branches
0.251147341 seconds time elapsed
Performance counter stats for './strscpy b 31 10000000':
176.598368 task-clock:u (msec) # 0.997 CPUs utilized
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
46 page-faults:u # 0.260 K/sec
681,367,948 cycles:u # 3.858 GHz
2,660,090,092 instructions:u # 3.90 insn per cycle
680,017,138 branches:u # 3850.642 M/sec
1,817 branch-misses:u # 0.00% of all branches
0.177150181 seconds time elapsed
View attachment "strscpy_test.c" of type "text/x-csrc" (3292 bytes)
Powered by blists - more mailing lists