[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131017003421.GA31470@hmsreliant.think-freely.org>
Date: Wed, 16 Oct 2013 20:34:21 -0400
From: Neil Horman <nhorman@...driver.com>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: Ingo Molnar <mingo@...nel.org>, linux-kernel@...r.kernel.org,
sebastien.dugue@...l.net, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org
Subject: Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
On Mon, Oct 14, 2013 at 03:18:47PM -0700, Eric Dumazet wrote:
> On Mon, 2013-10-14 at 14:19 -0700, Eric Dumazet wrote:
> > On Mon, 2013-10-14 at 16:28 -0400, Neil Horman wrote:
> >
> > > So, early testing results today. I wrote a test module that, allocated a 4k
> > > buffer, initalized it with random data, and called csum_partial on it 100000
> > > times, recording the time at the start and end of that loop. Results on a 2.4
> > > GHz Intel Xeon processor:
> > >
> > > Without patch: Average execute time for csum_partial was 808 ns
> > > With patch: Average execute time for csum_partial was 438 ns
> >
> > Impressive, but could you try again with data out of cache ?
>
> So I tried your patch on a GRE tunnel and got following results on a
> single TCP flow. (short result : no visible difference)
>
>
So I went to reproduce these results, but was unable to (due to the fact that I
only have a pretty jittery network to do testing accross at the moment with
these devices). So instead I figured that I would go back to just doing
measurements with the module that I cobbled together (operating under the
assumption that it would give me accurate, relatively jitter free results (I've
attached the module code for reference below). My results show slightly
different behavior:
Base results runs:
89417240
85170397
85208407
89422794
91645494
103655144
86063791
75647774
83502921
85847372
AVG = 875 ns
Prefetch only runs:
70962849
77555099
81898170
68249290
72636538
83039294
78561494
83393369
85317556
79570951
AVG = 781 ns
Parallel addition only runs:
42024233
44313064
48304416
64762297
42994259
41811628
55654282
64892958
55125582
42456403
AVG = 510 ns
Both prefetch and parallel addition:
41329930
40689195
61106622
46332422
49398117
52525171
49517101
61311153
43691814
49043084
AVG = 494 ns
For reference, each of the above large numbers is the number of nanoseconds
taken to compute the checksum of a 4kb buffer 100000 times. To get my average
results, I ran the test in a loop 10 times, averaged them, and divided by
100000.
Based on these, prefetching is obviously a a good improvement, but not as good
as parallel execution, and the winner by far is doing both.
Thoughts?
Neil
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
#include <linux/init.h>
#include <linux/moduleparam.h>
#include <linux/rtnetlink.h>
#include <net/rtnetlink.h>
#include <linux/u64_stats_sync.h>
static char *buf;
static int __init csum_init_module(void)
{
int i;
__wsum sum = 0;
struct timespec start, end;
u64 time;
buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
if (!buf) {
printk(KERN_CRIT "UNABLE TO ALLOCATE A BUFFER OF %lu bytes\n", PAGE_SIZE);
return -ENOMEM;
}
printk(KERN_CRIT "INITALIZING BUFFER\n");
get_random_bytes(buf, PAGE_SIZE);
preempt_disable();
printk(KERN_CRIT "STARTING ITERATIONS\n");
getnstimeofday(&start);
for(i=0;i<100000;i++)
sum = csum_partial(buf, PAGE_SIZE, sum);
getnstimeofday(&end);
preempt_enable();
if (start.tv_nsec > end.tv_nsec)
time = (ULLONG_MAX - end.tv_nsec) + start.tv_nsec;
else
time = end.tv_nsec - start.tv_nsec;
printk(KERN_CRIT "COMPLETED 100000 iterations of csum in %llu nanosec\n", time);
kfree(buf);
return 0;
}
static void __exit csum_cleanup_module(void)
{
return;
}
module_init(csum_init_module);
module_exit(csum_cleanup_module);
MODULE_LICENSE("GPL");
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists