[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 9 Sep 2011 15:42:33 +0200
From: Borislav Petkov <bp@...en8.de>
To: Maarten Lankhorst <m.b.lankhorst@...il.com>
CC: Linus Torvalds <torvalds@...ux-foundation.org>,
"Valdis.Kletnieks@...edu" <Valdis.Kletnieks@...edu>,
Ingo Molnar <mingo@...e.hu>,
melwyn lobo <linux.melwyn@...il.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"H. Peter Anvin" <hpa@...or.com>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: x86 memcpy performance
On Fri, Sep 09, 2011 at 01:23:09PM +0200, Maarten Lankhorst wrote:
> This specific one happened far more than any of the other memcpy usages, and
> ignoring the check when destination is page aligned, most of them are gone.
>
> In short: I don't think I can get a speedup by using avx memcpy in-kernel.
>
> YMMV, if it does speed up for you, I'd love to see concrete numbers. And not only worst
> case, but for the common aligned cases too. Or some concrete numbers that misaligned
> happens a lot for you.
Actually,
assuming alignment matters, I'd need to redo the trace_printk run I did
initially on buffer sizes:
http://marc.info/?l=linux-kernel&m=131331602309340 (kernel_build.sizes attached)
to get a more sensible grasp on the alignment of kernel buffers along
with their sizes and to see whether we're doing a lot of unaligned large
buffer copies in the kernel. I seriously doubt that, though, we should
be doing everything pagewise anyway so...
Concerning numbers, I ran your version again and sorted the output by
speedup. The highest scores are:
30037(12/44) 5566.4 12797.2 2.299011642
28672(12/44) 5512.97 12588.7 2.283467991
30037(28/60) 5610.34 12732.7 2.269502799
27852(12/44) 5398.36 12242.4 2.267803859
30037(4/36) 5585.02 12598.6 2.25578257
28672(28/60) 5499.11 12317.5 2.239914033
27852(28/60) 5349.78 11918.9 2.227919527
27852(20/52) 5335.92 11750.7 2.202186795
24576(12/44) 4991.37 10987.2 2.201247446
and this is pretty cool. Here are the (0/0) cases:
8192(0/0) 2627.82 3038.43 1.156255766
12288(0/0) 3116.62 3675.98 1.179475031
13926(0/0) 3330.04 4077.08 1.224334839
14336(0/0) 3377.95 4067.24 1.204055286
15018(0/0) 3465.3 4215.3 1.216430725
16384(0/0) 3623.33 4442.38 1.226050715
24576(0/0) 4629.53 6021.81 1.300737559
27852(0/0) 5026.69 6619.26 1.316823133
28672(0/0) 5157.73 6831.39 1.324495749
30037(0/0) 5322.01 6978.36 1.3112261
It is not 2x anymore but still.
Anyway, looking at the buffer sizes, they're rather ridiculous and even
if we get them in some workload, they won't repeat n times per second to
be relevant. So we'll see...
Thanks.
--
Regards/Gruss,
Boris.
View attachment "kernel_build.sizes" of type "text/plain" (926 bytes)
Powered by blists - more mailing lists