lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100915184235.GA31685@vigoh>
Date:	Wed, 15 Sep 2010 15:42:35 -0300
From:	"Gustavo F. Padovan" <padovan@...fusion.mobi>
To:	Mat Martineau <mathewm@...eaurora.org>
Cc:	linux-bluetooth@...r.kernel.org, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org, marcel@...tmann.org,
	davem@...emloft.net
Subject: Re: Possible regression with skb_clone() in 2.6.36

Hi Mat,

* Gustavo F. Padovan <padovan@...fusion.mobi> [2010-09-10 16:45:09 -0300]:

> Hi Mat,
> 
> * Mat Martineau <mathewm@...eaurora.org> [2010-09-10 09:53:31 -0700]:
> 
> > 
> > Gustavo -
> > 
> > I'm not sure why the streaming code used to work, but this does not 
> > look like an skb_clone() problem.  Your patch to remove the 
> > skb_clone() call in l2cap_streaming_send() addresses the root cause of 
> > this crash.
> > 
> > On Wed, 8 Sep 2010, Gustavo F. Padovan wrote:
> > 
> > > I've been experiencing some problems when running the L2CAP Streaming mode in
> > > 2.6.36. The system quickly runs in an Out Of Memory condition and crash. That
> > > wasn't happening before, so I think we may have a regression here (I didn't
> > > find where yet). The crash log is below.
> > >
> > > The following patch does not fix the regression, but shows that removing the
> > > skb_clone() call from l2cap_streaming_send() we workaround the problem. The
> > > patch is good anyway because it saves memory and time.
> > >
> > > By now I have no idea on how to fix this.
> > >
> > > <snip>
> > 
> > This has to do with the sk->sk_wmem_alloc accounting that controls the 
> > amount of write buffer space used on the socket.
> > 
> > When the L2CAP streaming mode socket segments its data, it allocates 
> > memory using sock_alloc_send_skb() (via bt_skb_send_alloc()).  Before 
> > that allocation call returns, skb_set_owner_w() is called on the new 
> > skb.  This adds to sk->sk_wmem_alloc and sets skb->destructor so that 
> > sk->sk_wmem_alloc is correctly updated when the skb is freed.
> > 
> > When that skb is cloned, the clone is not "owned" by the write buffer. 
> > The clone's destructor is set to NULL in __skb_clone().  The version 
> > of l2cap_streaming_send() that runs out of memory is passing the 
> > non-owned skb clone down to the HCI layer.  The original skb (the one 
> > that's "owned by w") is immediately freed, which adjusts 
> > sk->sk_wmem_alloc back down - the socket thinks it has unlimited write 
> > buffer space.  As a result, bt_skb_send_alloc() never blocks waiting 
> > for buffer space (or returns EAGAIN for nonblocking writes) and the 
> > HCI send queue keeps growing.
> 
> If the problem is what you are saying, add a skb_set_owner_w(skb, sk) on
> the cloned skb should solve the problem, but it doesn't. That's exactly
> what tcp_transmit_skb() does. Also that just appeared in 2.6.36, is was
> working fine before, i.e, we have a regression here. 

I've run some other tests and what you said also fixes the problem for
Streaming Mode. If I use skb_set_owner_w() on the cloned skb, everything
works fine. But we still have the problem for ERTM as I described.
send() blocks wainting for memory. The regression is there yet.
:(

> 
> > 
> > This isn't a problem for the ERTM sends, because the original skbs are 
> > kept in the ERTM tx queue until they are acked.  Once they're acked, 
> > the write buffer space is freed and additional skbs can be allocated.
> 
> It affects ERTM as well, but in that case the kernel doesn't crash
> because ERTM block on sending trying to allocate memory. Then we are not
> able to receive any ack (everything stays queued in sk_backlog_queue as
> the sk is owned by the user) and ERTM stalls.
> 

-- 
Gustavo F. Padovan
ProFUSION embedded systems - http://profusion.mobi
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ