lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 02 May 2012 20:00:19 -0700
From:	Alexander Duyck <alexander.duyck@...il.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
CC:	Alexander Duyck <alexander.h.duyck@...el.com>,
	David Miller <davem@...emloft.net>,
	netdev <netdev@...r.kernel.org>,
	Neal Cardwell <ncardwell@...gle.com>,
	Tom Herbert <therbert@...gle.com>,
	Jeff Kirsher <jeffrey.t.kirsher@...el.com>,
	Michael Chan <mchan@...adcom.com>,
	Matt Carlson <mcarlson@...adcom.com>,
	Herbert Xu <herbert@...dor.apana.org.au>,
	Ben Hutchings <bhutchings@...arflare.com>,
	Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>,
	Maciej Żenczykowski <maze@...gle.com>
Subject: Re: [PATCH net-next] net: take care of cloned skbs in tcp_try_coalesce()

On 5/2/2012 6:52 PM, Eric Dumazet wrote:
> On Wed, 2012-05-02 at 13:55 -0700, Alexander Duyck wrote:
>> On 05/02/2012 11:15 AM, Eric Dumazet wrote:
>>> On Wed, 2012-05-02 at 11:05 -0700, Alexander Duyck wrote:
>>>> You're correct about the fragstolen case, I actually was thinking of the
>>>> first patch you sent, not this second one.
>>>>
>>>> However we still have a problem.  What we end up with now is a case of
>>>> sharing in which the clone skb no longer knows that it is sharing the
>>>> head with another skb.  The dataref will drop to 1 when we call
>>>> __kfree_skb.  This means that any other function out there that tries to
>>>> see if the skb is shared would return false.  This could lead to issues
>>>> if there is anything out there that manipulates the data in head based
>>>> on the false assumption that it is not cloned.  What we would probably
>>>> need to do in this case is tweak the logic for skb_cloned.  If you are
>>>> using a head_frag you should probably add a check that returns true if
>>>> cloned is true and page_count is greater than 1.  We should be safe in
>>>> the case of skb_header_cloned since we already dropped are dataref when
>>>> we stole the page and freed the skb.
>>> I really dont understand this concern.
>>>
>>> When skb is cloned, we copy in head_frag __skb_clone()
>>>
>>> So both skbs have the bit set, and dataref = 2.
>>>
>>> first skb is freed, dataref becomes 1 and nothing special  happen
>>>
>>> > From this point, skb->head is not 'shared' anymore (taken your own
>>> words). And we are free to do whatever we want.
>>>
>>> second skb is freed, dataref becomes 0 and we call the right destructor.
>> The problem is that the stack will not be able to detect sharing.  As
>> long as page_count is greater than 2 and skb->cloned is set we should be
>> telling any callers to skb_cloned that the head is cloned.  Otherwise we
>> can run into issues elsewhere with well meaning code checking and not
>> detecting sharing, and then mangling the header.
> page count is irrelevant, since if PAGE_SIZE=65536, you can have 32
> fragments (of 2048 bytes) per page. Still we can call put_page() for
> each individual frag that must be freed.
>
> You forgot to give an example of path that would be failing. Since the
> skb_cloned() check is still valid.
>
> Head is cloned if : skb->cloned is set and dataref value is not 1
>
> (minus the skb_header_release() tweaks done on output path for tcp)
>
> Every time a 'caller' is going to modify/mangle its skb head, it must
> first call pskb_expand_head() (or various helpers around it) to :
>
> - allocate a new skb->head
> - copy old content to new head
> - release a reference on old head dataref
> - if old dataref reaches 0, 'free' old head (might be a kfree() or
> put_page())
This is exactly my point.  The way your current code works the check in 
pskb_expand_head will not detect the header is cloned.

So for example lets say you have one of your skbs that goes through and 
is merged.

1.  You start with a cloned skb.  dataref = 2, head_frag page count = 1;
2.  You go through tcp_try_coalesce.  dataref = 2, head_frag page count = 2;
3.  You call __kfree_skb on the skb.  dataref = 1, head_frag page count = 2;
4   At this point if the holder of the clone decides to modify the skb-> 
head there is nothing to stop it from doing so.

Odds are you would have to have a pretty extreme corner case to really 
see any issues with this since I know most raw sockets won't mangle the 
data portion of the frame.  I just figure we could save ourselves a lot 
of trouble by just not coalescing the head_frag in the case that the skb 
is cloned.

>> Also I am not sure if the big monolithic changes are really the best way
>> to approach this.  It would be nice if we could fix this incrementally
>> instead of trying to do it all at once since there are multiple issues
>> that need to be addressed.
>>
>> I will try to submit a few patches from my end later today.  I still
>> need to look over all of the changes from the past couple of weeks that
>> were based on the assumption that the IP stack completely owned the skb.
> I did my best to provide small changes.
I just meant you didn't need to do things like add the kfree_skb_partial 
helper in the same patch.

> Plus TCP coalescing is done after IP processing.
>
> Owning skb is a vague concept anyway. IP borrows skb but not owns them.
> They already could be cloned skb.
I know it is a vague concept.  However it is one of those concepts where 
if we don't get it right we start to see random panics and call traces 
due to memory corruption.  That is why I prefer to avoid it if at all 
possible.  All I am really asking for is that we don't just use the 
head_frag as a page unless we know it is not cloned.  All those checks 
that just look for skb->head_frag could check for skb->head_frag && 
!skb_cloned(skb) and I would be a much happier person since then I know 
all owners of the clones will be using the same variable for reference 
count tracking.

I'm still working on those patches.  I thought I had sent them earlier 
but it looks like I had an email issue, and since then Dave pulled your 
patches so I am rebasing the patches on the updated tree and should have 
something in the next hour or so.

Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ