[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4FA17781.6080306@intel.com>
Date: Wed, 02 May 2012 11:05:53 -0700
From: Alexander Duyck <alexander.h.duyck@...el.com>
To: Eric Dumazet <eric.dumazet@...il.com>
CC: Alexander Duyck <alexander.duyck@...il.com>,
David Miller <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>,
Neal Cardwell <ncardwell@...gle.com>,
Tom Herbert <therbert@...gle.com>,
Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
Michael Chan <mchan@...adcom.com>,
Matt Carlson <mcarlson@...adcom.com>,
Herbert Xu <herbert@...dor.apana.org.au>,
Ben Hutchings <bhutchings@...arflare.com>,
Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>,
Maciej Żenczykowski <maze@...gle.com>
Subject: Re: [PATCH net-next] net: take care of cloned skbs in tcp_try_coalesce()
On 05/02/2012 09:46 AM, Eric Dumazet wrote:
> On Wed, 2012-05-02 at 09:27 -0700, Alexander Duyck wrote:
[...]
>>>>> @@ -4515,7 +4521,12 @@ copyfrags:
>>>>> offset = from->data - (unsigned char *)page_address(page);
>>>>> skb_fill_page_desc(to, skb_shinfo(to)->nr_frags,
>>>>> page, offset, skb_headlen(from));
>>>>> - *fragstolen = true;
>>>>> +
>>>>> + if (skb_cloned(from))
>>>>> + get_page(page);
>>>>> + else
>>>>> + *fragstolen = true;
>>>>> +
>>>>> delta = len; /* we dont know real truesize... */
>>>>> goto copyfrags;
>>>>> }
>>>>>
>>>>>
>>>> I don't see where we are now addressing the put_page call to release the
>>>> page afterwards. By calling get_page you are incrementing the page
>>>> count by one, but where are you decrementing dataref in the shared
>>>> info? Without that we are looking at a memory leak because __kfree_skb
>>>> will decrement the dataref but it will never reach 0 so it will never
>>>> call put_page on the head frag.
>>> really the dataref was already incremented at skb_clone() time
>>>
>>> It will be properly decremented since we call __kfree_skb()
>>>
>>> Only the last decrement will perform the put_page()
>>>
>>> Think about splice() is doing, its the same get_page() game.
>> I think you are missing the point. So skb_clone will bump the dataref
>> to 2, calling get_page will bump the page count to 2. After this
>> function you don't call __kfree_skb(skb) instead you call
>> kmem_cache_free(skbuff_head_cache, skb). This will free the sk_buff,
>> but not decrement dataref leaving it at 2. Eventually the raw socket
>> will call kfree_skb(skb) on the clone dropping the dataref to 1 and you
>> will call put_page dropping the page count to 1, but I don't see where
>> the last __kfree_skb call will come from that will drop dataref and the
>> page count to 0.
> No, you miss that _if_ we added one to page count, then we wont call
> kmem_cache_free(skbuff_head_cache, skb) but call __kfree_skb(skb)
> instead because fragstolen will be false.
>
> if (fragstolen)
> kmem_cache_free(...)
> else
> __kfree_skb(...)
>
> In future patch (addressing tcp coalescing in tcp_queue_rcv() as well),
> I'll add a helper to make this more clear.
You're correct about the fragstolen case, I actually was thinking of the
first patch you sent, not this second one.
However we still have a problem. What we end up with now is a case of
sharing in which the clone skb no longer knows that it is sharing the
head with another skb. The dataref will drop to 1 when we call
__kfree_skb. This means that any other function out there that tries to
see if the skb is shared would return false. This could lead to issues
if there is anything out there that manipulates the data in head based
on the false assumption that it is not cloned. What we would probably
need to do in this case is tweak the logic for skb_cloned. If you are
using a head_frag you should probably add a check that returns true if
cloned is true and page_count is greater than 1. We should be safe in
the case of skb_header_cloned since we already dropped are dataref when
we stole the page and freed the skb.
Thanks,
Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists