[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131018201133.GD4019@hmsreliant.think-freely.org>
Date: Fri, 18 Oct 2013 16:11:34 -0400
From: Neil Horman <nhorman@...driver.com>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: Ingo Molnar <mingo@...nel.org>, linux-kernel@...r.kernel.org,
sebastien.dugue@...l.net, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org
Subject: Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
On Fri, Oct 18, 2013 at 10:20:35AM -0700, Eric Dumazet wrote:
> On Fri, 2013-10-18 at 12:50 -0400, Neil Horman wrote:
> > >
>
> > for(i=0;i<100000;i++) {
> > sum = csum_partial(buf+offset, PAGE_SIZE, sum);
> > offset = (offset < BUFSIZ-PAGE_SIZE) ? offset+PAGE_SIZE : 0;
> > }
>
> Please replace this by random accesses, and use the more standard 1500
> length.
>
> offset = prandom_u32() % (BUFSIZ - 1500);
> offset &= ~1U;
>
> sum = csum_partial(buf + offset, 1500, sum);
>
> You are basically doing sequential accesses, so prefetch should
> be automatically done by cpu itself.
>
> Thanks !
>
>
>
Sure, you got it! Results below. However, they continue to bear out that
parallel execution beats prefetch only execution, and both is better than either
one.
base results:
53156647
59670931
62839770
44842780
39297190
44905905
53300688
53287805
39436951
43021730
AVG=493 ns
prefetch-only results:
40337434
51986404
43509199
53128857
52973171
53520649
53536338
50325466
44864664
47908398
AVG=492 ns
parallel-only results:
52157183
44496511
36180011
38298368
36258099
43263531
45365519
54116344
62529241
63118224
AVG = 475 ns
both prefetch and parallel:
44317078
44526464
45761272
44477906
34868814
44637904
49478309
49718417
58681403
58304972
AVG = 474 ns
Heres the code I was using
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/netdevice.h>
#include <linux/etherdevice.h>
#include <linux/init.h>
#include <linux/moduleparam.h>
#include <linux/rtnetlink.h>
#include <net/rtnetlink.h>
#include <linux/u64_stats_sync.h>
static char *buf;
#define BUFSIZ_ORDER 4
#define BUFSIZ ((2 << BUFSIZ_ORDER) * (1024*1024*2))
static int __init csum_init_module(void)
{
int i;
__wsum sum = 0;
struct timespec start, end;
u64 time;
struct page *page;
u32 offset = 0;
page = alloc_pages((GFP_TRANSHUGE & ~__GFP_MOVABLE), BUFSIZ_ORDER);
if (!page) {
printk(KERN_CRIT "NO MEMORY FOR ALLOCATION");
return -ENOMEM;
}
buf = page_address(page);
printk(KERN_CRIT "INITALIZING BUFFER\n");
preempt_disable();
printk(KERN_CRIT "STARTING ITERATIONS\n");
getnstimeofday(&start);
for(i=0;i<100000;i++) {
sum = csum_partial(buf+offset, 1500, sum);
offset = prandom_u32() % (BUFSIZ - 1500);
offset &= ~1U;
}
getnstimeofday(&end);
preempt_enable();
if ((unsigned long)start.tv_nsec > (unsigned long)end.tv_nsec)
time = (ULONG_MAX - (unsigned long)end.tv_nsec) + (unsigned long)start.tv_nsec;
else
time = (unsigned long)end.tv_nsec - (unsigned long)start.tv_nsec;
printk(KERN_CRIT "COMPLETED 100000 iterations of csum in %llu nanosec\n", time);
__free_pages(page, BUFSIZ_ORDER);
return 0;
}
static void __exit csum_cleanup_module(void)
{
return;
}
module_init(csum_init_module);
module_exit(csum_cleanup_module);
MODULE_LICENSE("GPL");
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists