[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131020212910.GA3387@neilslaptop.think-freely.org>
Date: Sun, 20 Oct 2013 17:29:10 -0400
From: Neil Horman <nhorman@...driver.com>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: Ingo Molnar <mingo@...nel.org>, linux-kernel@...r.kernel.org,
sebastien.dugue@...l.net, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org
Subject: Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's
On Fri, Oct 18, 2013 at 02:15:52PM -0700, Eric Dumazet wrote:
> On Fri, 2013-10-18 at 16:11 -0400, Neil Horman wrote:
>
> > #define BUFSIZ_ORDER 4
> > #define BUFSIZ ((2 << BUFSIZ_ORDER) * (1024*1024*2))
> > static int __init csum_init_module(void)
> > {
> > int i;
> > __wsum sum = 0;
> > struct timespec start, end;
> > u64 time;
> > struct page *page;
> > u32 offset = 0;
> >
> > page = alloc_pages((GFP_TRANSHUGE & ~__GFP_MOVABLE), BUFSIZ_ORDER);
>
> Not sure what you are doing here, but its not correct.
>
Why not? You asked for a test with 32 hugepages, so I allocated 32 hugepages.
> You have a lot of variations in your results, I suspect a NUMA affinity
> problem.
>
I do have some variation, you're correct, but I don't think its a numa issue
> You can try the following code, and use taskset to make sure you run
> this on a cpu on node 0
>
I did run this with taskset to do exactly that (hence my comment above). I'll
be glad to run your variant on monday morning though and provide results.
Best
Neil
> #define BUFSIZ 2*1024*1024
> #define NBPAGES 16
>
> static int __init csum_init_module(void)
> {
> int i;
> __wsum sum = 0;
> u64 start, end;
> void *base, *addrs[NBPAGES];
> u32 rnd, offset;
>
> memset(addrs, 0, sizeof(addrs));
> for (i = 0; i < NBPAGES; i++) {
> addrs[i] = kmalloc_node(BUFSIZ, GFP_KERNEL, 0);
> if (!addrs[i])
> goto out;
> }
>
> local_bh_disable();
> pr_err("STARTING ITERATIONS on cpu %d\n", smp_processor_id());
> start = ktime_to_ns(ktime_get());
>
> for (i = 0; i < 100000; i++) {
> rnd = prandom_u32();
> base = addrs[rnd % NBPAGES];
> rnd /= NBPAGES;
> offset = rnd % (BUFSIZ - 1500);
> offset &= ~1U;
> sum = csum_partial_opt(base + offset, 1500, sum);
> }
> end = ktime_to_ns(ktime_get());
> local_bh_enable();
>
> pr_err("COMPLETED 100000 iterations of csum %x in %llu nanosec\n", sum, end - start);
>
> out:
> for (i = 0; i < NBPAGES; i++)
> kfree(addrs[i]);
>
> return 0;
> }
>
> static void __exit csum_cleanup_module(void)
> {
> return;
> }
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists