[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6296.1313462075@turing-police.cc.vt.edu>
Date: Mon, 15 Aug 2011 22:34:35 -0400
From: Valdis.Kletnieks@...edu
To: Borislav Petkov <bp@...en8.de>
Cc: Ingo Molnar <mingo@...e.hu>, melwyn lobo <linux.melwyn@...il.com>,
linux-kernel@...r.kernel.org, "H. Peter Anvin" <hpa@...or.com>,
Thomas Gleixner <tglx@...utronix.de>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
borislav.petkov@....com
Subject: Re: x86 memcpy performance
On Sun, 14 Aug 2011 11:59:10 +0200, Borislav Petkov said:
> Benchmarking with 10000 iterations, average results:
> size XM MM speedup
> 119 540.58 449.491 0.8314969419
> 12273 2307.86 4042.88 1.751787902
> 13924 2431.8 4224.48 1.737184756
> 14335 2469.4 4218.82 1.708440514
> 15018 2675.67 1904.07 0.711622886
> 16374 2989.75 5296.26 1.771470902
> 24564 4262.15 7696.86 1.805863077
> 27852 4362.53 3347.72 0.7673805572
> 28672 5122.8 7113.14 1.388524413
> 30033 4874.62 8740.04 1.792967931
The numbers for 15018 and 27852 are *way* odd for the MM case. I don't feel
really good about this till we understand what happened for those two cases.
Also, anytime I see "10000 iterations", I ask myself if the benchmark rigging
took proper note of hot/cold cache issues. That *may* explain the two oddball
results we see above - but not knowing more about how it was benched, it's hard
to say.
Content of type "application/pgp-signature" skipped
Powered by blists - more mailing lists