[<prev] [next>] [day] [month] [year] [list]
Message-ID: <4E4422EB.7060508@hp.com>
Date: Thu, 11 Aug 2011 11:43:55 -0700
From: Rick Jones <rick.jones2@...com>
To: "J.Hwan Kim" <frog1120@...il.com>
CC: netdev@...r.kernel.org
Subject: Re: Intel 82599 ixgbe driver performance
On 08/10/2011 06:57 PM, J.Hwan Kim wrote:
> On 2011년 08월 11일 05:58, Rick Jones wrote:
>> On 08/09/2011 11:19 PM, J.Hwan Kim wrote:
>>> Hi, everyone
>>>
>>> I'm testing our network card which includes intel 82599 based on
>>> ixgbe driver. I wonder what is the Rx performance of i82599 without
>>> network stack only with 64Byte frames. Our driver reads the packet
>>> directly from DMA packet buffer and push to the application without
>>> passing through linux kernel stack. It seems that the intel 82599
>>> cannot push 64B frames to DMA area in 10G. Is it right?
>>
>> Does your driver perform a copy of that 64B frame to user space?
> Our driver and user application shares the packet memory
>
>> Is this a single-threaded test running?
> Now, 4 core is running and 4 RX queue is used, of which intrerrupt
> affinity is set, but the result is worse than 1 single queue.
>> What does an lat_mem_rd -t (-t for random stride) test from lmbench
>> give for your system's memory latency? (Perhaps using numactl to
>> ensure local, or remote memory access, as you desire)
> ./lat_mem_rd -t 128
> "stride=64
>
> 0.00049 1.003
> 0.00098 1.003
> 0.00195 1.003
> 0.00293 1.003
> 0.00391 1.003
> 0.00586 1.003
> 0.00781 1.003
> 0.01172 1.003
> 0.01562 1.003
> 0.02344 1.003
> 0.03125 1.003
> 0.04688 5.293
> 0.06250 5.307
> 0.09375 5.571
> 0.12500 5.683
> 0.18750 5.683
> 0.25000 5.683
> 0.37500 16.394
> 0.50000 42.394
Unless the chip you are using has a rather tiny (by today's standards)
data cache, you need to go much father there - I suspect that at 0.5 MB
you have not yet gotten beyond the size of the last level of data cache
on the chip.
I would suggest:
(from a system that is not otherwise idle...)
./lat_mem_rd -t 512 256"stride=256
0.00049 1.237
0.00098 1.239
0.00195 1.228
0.00293 1.238
0.00391 1.243
0.00586 1.238
0.00781 1.250
0.01172 1.249
0.01562 1.251
0.02344 1.247
0.03125 1.247
0.04688 3.125
0.06250 3.153
0.09375 3.158
0.12500 3.177
0.18750 6.636
0.25000 8.729
0.37500 16.167
0.50000 16.901
0.75000 16.953
1.00000 17.362
1.50000 18.781
2.00000 20.243
3.00000 23.434
4.00000 24.965
6.00000 35.951
8.00000 56.026
12.00000 76.169
16.00000 80.741
24.00000 83.237
32.00000 84.043
48.00000 84.132
64.00000 83.775
96.00000 83.298
128.00000 83.039
192.00000 82.659
256.00000 82.464
384.00000 82.280
512.00000 82.092
You can see the large jump starting at 8MB - that is where the last
level cache runs-out on the chip I'm using - an Intel W3550.
Now, as run, that will include TLB miss overhead once the area of memory
being accessed is larger than can be mapped by the chip's TLB at the
page size being used. You can use libhugetlbfs to mitigate that through
the use of hugepages.
rick jones
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists