[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOGi=dO-_MedVNvNy=+VyKXG2D=Mat8=Vyr-CttxcrNFO0Vt5Q@mail.gmail.com>
Date: Mon, 4 Feb 2013 10:53:06 +0800
From: Ling Ma <ling.ma.program@...il.com>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: David Miller <davem@...emloft.net>, netdev@...r.kernel.org,
maze@...gle.com
Subject: Re: [PATCH v2 net-next] inet: Get critical word in first 64bit of
cache line
I attached my test program(we force all cpu loads issue one by one ,
and avoid cpu hardwre prefetch cc -o test-cwf test-cwf.c.) and
cpu-info, the result from ./test-cwf indicates as below:
looking-up aligned time 157000272, looking-up unaligned time 162652724
If I was wrong please correct me.
Thanks
Ling
2013/2/4, Eric Dumazet <eric.dumazet@...il.com>:
> On Sun, 2013-02-03 at 16:18 -0800, Eric Dumazet wrote:
>> On Sun, 2013-02-03 at 16:08 -0500, David Miller wrote:
>> > From: Eric Dumazet <eric.dumazet@...il.com>
>> > Date: Sat, 02 Feb 2013 07:03:55 -0800
>> >
>> > > From: Ma Ling <ling.ma.program@...il.com>
>> > >
>> > > In order to reduce memory latency when last level cache miss occurs,
>> > > modern CPUs i.e. x86 and arm introduced Critical Word First(CWF) or
>> > > Early Restart(ER) to get data ASAP. For CWF if critical word is first
>> > > member
>> > > in cache line, memory feed CPU with critical word, then fill others
>> > > data in cache line one by one, otherwise after critical word it must
>> > > cost more cycle to fill the remaining cache line. For Early First CPU
>> > > will restart until critical word in cache line reaches.
>> > >
>> > > Hash value is critical word, so in this patch we place it as first
>> > > member in cache line (sock address is cache-line aligned), and it is
>> > > also good for Early Restart platform as well .
>> > >
>> > > [edumazet: respin on net-next after commit ce43b03e8889]
>> > >
>> > > Signed-off-by: Ma Ling <ling.ma.program@...il.com>
>> > > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
>> >
>> > I completely agree with the other response to this patch in that
>> > the description is bogus.
>> >
>> > If CWF is implemented in the cpu, it should exactly relieve us from
>> > having to move things around in structures so carefully like this.
>> >
>> > Either the patch should be completely dropped (modern cpus don't
>> > need this) or the commit message changed to reflect reality.
>> >
>> > It really makes a terrible impression upon me when the patch says
>> > something which in fact is 180 degrees from reality.
>>
>> Hmm.
>>
>> Maybe the changelog is misleading, or maybe all the performance gains I
>> have from this patch are probably some artifact or old/bad hardware, or
>> something else.
>>
>>
>>
>> (Intel(R) Xeon(R) CPU X5660 @ 2.80GHz)
>> # ./cwf
>> looking-up aligned time 108712072,
>> looking-up unaligned time 113268256
>> looking-up aligned time 108677032,
>> looking-up unaligned time 113297636
>>
>>
>> (Intel(R) Xeon(R) CPU X5679 @ 3.20GHz)
>> # ./cwf
>> looking-up aligned time 139193589,
>> looking-up unaligned time 144307821
>> looking-up aligned time 139136787,
>> looking-up unaligned time 144277752
>>
>> My laptop : Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz
>> # ./cwf
>> looking-up aligned time 84869203,
>> looking-up unaligned time 86843462
>> looking-up aligned time 84253003,
>> looking-up unaligned time 86227675
>>
>> #include <stdio.h>
>> #include <string.h>
>> #include <stdlib.h>
>> #include <unistd.h>
>>
>> #define CACHELINE_SZ 64L
>>
>> #define BIGBUFFER_SZ (64<<20)
>>
>> # define HP_TIMING_NOW(Var) \
>> ({ unsigned long long _hi, _lo; \
>> asm volatile ("rdtsc" : "=a" (_lo), "=d" (_hi)); \
>> (Var) = _hi << 32 | _lo; })
>>
>> #define repeat_times 20
>>
>> char *bufzap;
>>
>> static void zap_cache(void)
>> {
>> memset(bufzap, 2, BIGBUFFER_SZ);
>> memset(bufzap, 3, BIGBUFFER_SZ);
>> memset(bufzap, 4, BIGBUFFER_SZ);
>> }
>>
>> static char *init_buf(void)
>> {
>> void *res;
>>
>> if (posix_memalign(&res, CACHELINE_SZ, BIGBUFFER_SZ)) {
>> fprintf(stderr, "malloc() failed");
>> exit(1);
>> }
>>
>> memset(res, 1, BIGBUFFER_SZ);
>> return res;
>> }
>>
>> unsigned long total;
>>
>> static unsigned long random_access(void *buffer,
>> unsigned int off1,
>> unsigned int off2,
>> unsigned int off3)
>> {
>> int i;
>> unsigned int n;
>> unsigned long sum = 0;
>> unsigned long *ptr;
>>
>> srandom(7777);
>> for (i = 0; i < 1000000; i++) {
>> n = random() % (BIGBUFFER_SZ/CACHELINE_SZ);
>> ptr = buffer + n*CACHELINE_SZ;
>> if (ptr[off1])
>> sum++;
>> if (ptr[off2])
>> sum++;
>> // if (ptr[off3])
>> // sum++;
>
> Hmm, I don't know why I left a comment on these two lines...
>
> Of course, results are a bit different removing the comments :
>
> looking-up aligned time 113601316,
> looking-up unaligned time 115964760
> looking-up aligned time 113698636,
> looking-up unaligned time 115986072
>
> More testing is probably needed.
>
>
>
View attachment "cpu-info" of type "text/plain" (7050 bytes)
View attachment "test-cwf.c" of type "text/x-csrc" (1986 bytes)
Powered by blists - more mailing lists