[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <50509F30.30402@mellanox.com>
Date: Wed, 12 Sep 2012 17:41:52 +0300
From: Shlomo Pongartz <shlomop@...lanox.com>
To: Eric Dumazet <eric.dumazet@...il.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: GRO aggregation
On 9/12/2012 12:33 PM, Eric Dumazet wrote:
> On Wed, 2012-09-12 at 12:23 +0300, Shlomo Pongartz wrote:
>> On 9/11/2012 10:35 PM, Eric Dumazet wrote:
>>> On Tue, 2012-09-11 at 19:24 +0000, Shlomo Pongratz wrote:
>>>
>>>> I see that in ixgbe the weight for the NAPI is 64 (netif_napi_add). So
>>>> if packets are arriving in high rate then an the CPU is fast enough to
>>>> collect the packets as they arrive, assuming packets continue to
>>>> arrives while the NAPI runs. Then it should have aggregate more. So we
>>>> will have less passes trough the stack.
>>>>
>>> As I said, _if_ your cpu was loaded by other stuff, then you would see
>>> biggest GRO packets.
>>>
>>> GRO is not : "We want to kill latency and have big packets just because
>>> its better"
>>>
>>> Its more like : If load is big enough, try to aggregate TCP frames in
>>> less skbs.
>>>
>>>
>>>
>>>
>> First I want to apologize for breaking the mailing thread. I wasn't at
>> work and used webmail.
>>
>> I agree with your but I think that something is still strange.
>> On the transmitter side all the offloading are enabled, e.g. TSO and GSO.
>> The tcpdump on the sender side shows size of 64240 which is 44 packets
>> of 1460 each.
>> Now since the offloading are enabled the HW should transmit 44 frames
>> back to back,
>> that is in a burst of 44 * 1500 bytes, which according to my calculation
>> should take 52.8 micro on 10G Ethernet.
>> Using ethtool I've set the rx-usecs to 1022 micro, which I think is the
>> maximal value for ixgbe.
>> Note that there is no way to set rx-frames on ixgbe.
>> Now since the ixgbe weight is 64 I expected that the NAPI will be able
>> to poll for more then 21 packets,
>> since 44 packets came in one burst.
>> However the results remains the same.
> TSO uses PAGE frags, so 64KB needs about 16 pages.
>
> tcp_sendmsg() could even use order-3 pages, so that only 2 pages would
> be needed to fill 64KB of data.
>
> GRO uses whatever fragment size provided by NIC, depending on MTU.
>
> One skb has a limit on number of frags.
>
> Handling a huge array of frags would be actually slower in some helper
> functions.
>
> Since you dont exactly describe why you ask all these questions, its
> hard to guess what problem you try to solve.
>
>
>
> .
>
Hi Eric
The TSO is just a mean to create a burst of frames on the wire so the
NAPI will be able to pool as much as possible.
I'm looking on the aggregation done by GRO on behalf of IPoIB. With
IPoIB I added a counter that counts how many
packets were aggregated before napi_complete is called (ether directly
or by net_rx_action) and found that although
the NAPI consumes 64 packets on average before napi_complete is called,
the tcpdump shows that no more then 16-17
packets were aggregated. BTW when I increased the MTU to 4K I did
reached 64K aggregation which again is 16-17 packets.
So in order to see if 17 packets is the aggregation limit I wanted to
see how ixgbe is doing and found that it aggregates 21 packets.
So I wanted to know if there is another factor that governs the
aggregation, one that I can tune.
Shlomo.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists