netdev - Re: GRO aggregation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJZOPZLgQVq+pS1PTU2SM2C_dPPuHx8EnVL8zH077zm5O9aafQ@mail.gmail.com>
Date:	Thu, 13 Sep 2012 15:47:59 +0300
From:	Or Gerlitz <or.gerlitz@...il.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	Shlomo Pongartz <shlomop@...lanox.com>,
	Rick Jones <rick.jones2@...com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	Tom Herbert <therbert@...gle.com>,
	Yevgeny Petrilin <yevgenyp@...lanox.co.il>
Subject: Re: GRO aggregation

On Thu, Sep 13, 2012 at 3:05 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> On Thu, 2012-09-13 at 12:59 +0300, Or Gerlitz wrote:
>> On Thu, Sep 13, 2012 at 11:11 AM, Eric Dumazet <eric.dumazet@...il.com> wrote:
>> > MAX_SKB_FRAGS is 16
>> > skb_gro_receive() will return -E2BIG once this limit is hit.
>> > If you use a MSS = 100 (instead of MSS = 1460), then GRO skb will
>> > contain only at most 1700 bytes, but TSO packets can still be 64KB, if
>> > the sender NIC can afford it (some NICS wont work quite well)

>> Addressing this assertion of yours, Shlomo showed that with ixgbe he managed
>> to see GRO aggregating 32KB which means 20-21 packets that is > 16 fragments
>> in this notation, can it be related to the way ixgbe is actually allocating skbs?

> Hard to say without knowing exact kernel version, as things change a lot in this area.

As Shlomo wrote earlier on this thread his testbed is 3.6-rc1


> You have several kind of GRO. One fast and one slow.
> The slow one uses a linked list of skbs (pinfo->frag_list), while the
> fast one uses fragments (pinfo->nr_frags)
>
> For example, some drivers (mellanox one is in this lot) pull too many
> bytes in skb->head and this defeats the fast GRO :
> Part of payload is in skb->head, remaining part in pinfo->frags[0]
>
> skb_gro_receive() then has to allocate a new head skb, to link skbs into
> head->frag_list. The total skb->truesize is not reduced at all, its
> increased.
>
> So you might think GRO is working, but its only a hack, as one skb has a
> list of skbs, and this makes TCP read() slower, and defeats TCP
> coalescing as well. Whats the point of delivering fat skbs to TCP stack
> if it slows down the consumer, because of increased cache line misses ?

Shlomo is dealing with making the IPoIB driver work well with GRO,
thanks for the
comments on the Mellanox Ethernet driver, we will look there too
(added Yevgeny)...

As for IPoIB it has two modes, connected which irrelevant for this
discussion, and datagram
- who is under the  scope here. Its MTU is typically 2044 but can be
4092 as well, the allocation
of skb's for this mode is done in ipoib_alloc_rx_skb() -- which you've
patched recently...

Following your comment we noted that if using the lower/typical mtu of
2044 which means
we are below the ipoib_ud_need_sg() threshold, skbs are allocated on
one "form" and if using
the 4092 mtu in another "form" - do you see each of the form to fall
into different GRO flow, e.g
2044 to the "slow" and 4092 to the "fast"?!

Or.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html