netdev - Re: [PATCH 3/4 v2 net-next] net: make GRO aware of skb->head

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4FA06D7A.6090800@intel.com>
Date:	Tue, 01 May 2012 16:10:50 -0700
From:	Alexander Duyck <alexander.h.duyck@...el.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
CC:	Alexander Duyck <alexander.duyck@...il.com>,
	David Miller <davem@...emloft.net>,
	netdev <netdev@...r.kernel.org>,
	Neal Cardwell <ncardwell@...gle.com>,
	Tom Herbert <therbert@...gle.com>,
	Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
	Michael Chan <mchan@...adcom.com>,
	Matt Carlson <mcarlson@...adcom.com>,
	Herbert Xu <herbert@...dor.apana.org.au>,
	Ben Hutchings <bhutchings@...arflare.com>,
	Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>,
	Maciej Żenczykowski <maze@...gle.com>
Subject: Re: [PATCH 3/4 v2 net-next] net: make GRO aware of skb->head_frag

On 05/01/2012 03:58 PM, Alexander Duyck wrote:
> On 05/01/2012 10:04 AM, Eric Dumazet wrote:
>> On Tue, 2012-05-01 at 09:17 -0700, Alexander Duyck wrote:
>>> On 04/30/2012 11:39 PM, Eric Dumazet wrote:
>>>> On Mon, 2012-04-30 at 22:33 -0700, Alexander Duyck wrote:
>>>>
>>>>> The question I had was more specific to GRO.  As long as we have
>>>>> skb->users == 1 and the skb isn't cloned we should be fine.   It just
>>>>> hadn't occurred to me before that napi_gro_receive had the extra
>>>>> requirement that the skb couldn't be cloned.
>>>>>
>>>> OK
>>>>
>>>> By the way, even if skb was cloned, we would be allowed to steal
>>>> skb->head.
>>>>
>>>> When we clone an oskb we :
>>>>
>>>> 1) allocate a struct nskb sk_buff (or use the shadow in case of TCP)
>>>> 2) increment dataref
>>> The problem I have is with this piece right here.  So you increment
>>> dataref.  Now you have an skb that is still pointing to the shared info
>>> on this page and dataref is 2.  What about the side that is stealing the
>>> head?  Is it going to be tracking the dataref as well and decrementing
>>> it before put_page or does it just assume that dataref is 1 and call
>>> put_page directly?  I am guessing the latter since I didn't see anything
>>> that allowed for tracking the dataref of stolen heads.
>> The only changed thing is the kfree() replaced by put_page()
>>
>> This kfree() was done when last reference to dataref was released.
>>
>> If we had a problem before, we have same problem after my patch.
>>
>> Truth is : In TCP (coalesce and splice()) and GRO, we owns skbs.
>>
>> (See the various __kfree_skb(skb) calls in net/ipv4/tcp_input.c
>> There is one exception in ipv6 / treq->pktopts ) but its for SYN packet
>> and this wont be merged with a previous packet.
> Eric,
>
> I think I have found a bug, although it is not specific to this patch
> but it is related.  It looks like the TCP coalesce code is causing
> tcpdump to fail when using frags.  Based on the comments in the patch I
> am assuming you have an ixgbe adapter to test with as that is what I
> reproduced this on.  To reproduce the issue all you need to do is run
> "tcpdump -i ethX > /dev/null" on one console, and on a second console
> run a netperf TCP_MAERTS test to some other server.  Tcpdump will exit
> out with a message about bad address like this:
>
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on eth1, link-type EN10MB (Ethernet), capture size 96 bytes
> tcpdump: pcap_loop: recvfrom: Bad address
> 682 packets captured
> 2357 packets received by filter
> 1543 packets dropped by kernel
>
>
> A bisect of the issue tracked it down to:
>
> 1402d366019fedaa2b024f2bac06b7cc9a8782e1 is first bad commit
> commit 1402d366019fedaa2b024f2bac06b7cc9a8782e1
> Author: Eric Dumazet <edumazet@...gle.com>
> Date:   Mon Apr 23 07:11:42 2012 +0000
>
>     tcp: introduce tcp_try_coalesce
>     
>     commit c8628155ece3 (tcp: reduce out_of_order memory use) took care of
>     coalescing tcp segments provided by legacy devices (linear skbs)
>     
>     We extend this idea to fragged skbs, as their truesize can be heavy.
>     
>     ixgbe for example uses 256+1024+PAGE_SIZE/2 = 3328 bytes per segment.
>     
>     Use this coalescing strategy for receive queue too.
>     
>     This contributes to reduce number of tcp collapses, at minimal cost, and
>     reduces memory overhead and packets drops.
>     
>     Signed-off-by: Eric Dumazet <edumazet@...gle.com>
>     Cc: Neal Cardwell <ncardwell@...gle.com>
>     Cc: Tom Herbert <therbert@...gle.com>
>     Cc: Maciej Å»enczykowski <maze@...gle.com>
>     Cc: Ilpo JÃ¤rvinen <ilpo.jarvinen@...sinki.fi>
>     Acked-by: Neal Cardwell <ncardwell@...gle.com>
>     Signed-off-by: David S. Miller <davem@...emloft.net>
>
> :040000 040000 8ca3e0b4e6c6a8f375fd800069d24203880623f3 2576d34c5c9cfc717a11e2ebe054143956716b93 M	net
>
> I suspect we are dealing with either a shared or cloned skb in this
> case, though I haven't verified which it is yet.
>
> Thanks,
>
> Alex
>
One additional note.  It looks like LRO and GRO need to be disabled to
trigger the bug.  If either of them are enabled it doesn't seem to
occur.  Likely due to the fact that they are doing the coalescing before
it gets up to the tcp_try_coalesce call.

Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html