netdev - RE: [PATCH net-next v2 2/8] tipc: compensate for double accounting in socket rcv buffer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 14 May 2014 12:53:45 +0000
From:	Jon Maloy <jon.maloy@...csson.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
CC:	"davem@...emloft.net" <davem@...emloft.net>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	Paul Gortmaker <paul.gortmaker@...driver.com>,
	"Erik Hugne" <erik.hugne@...csson.com>,
	"ying.xue@...driver.com" <ying.xue@...driver.com>,
	"maloy@...jonn.com" <maloy@...jonn.com>,
	"tipc-discussion@...ts.sourceforge.net" 
	<tipc-discussion@...ts.sourceforge.net>
Subject: RE: [PATCH net-next v2 2/8] tipc: compensate for double accounting
 in socket rcv buffer

> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@...il.com]
> Sent: May-14-14 7:47 AM
> To: Jon Maloy
> Cc: davem@...emloft.net; netdev@...r.kernel.org; Paul Gortmaker; Erik
> Hugne; ying.xue@...driver.com; maloy@...jonn.com; tipc-
> discussion@...ts.sourceforge.net
> Subject: Re: [PATCH net-next v2 2/8] tipc: compensate for double accounting
> in socket rcv buffer
> 
> On Wed, 2014-05-14 at 05:39 -0400, Jon Maloy wrote:
> > The function net/core/sock.c::__release_sock() runs a tight loop to
> > move buffers from the socket backlog queue to the receive queue.

[...]

> 
> This looks very complicated and hides some underlying problem.

For us, the underlying problem is that sk_backlog.len does not
give correct information about the buffer situation.  There is a comment
in  _release_sock() trying to explain why.:

/*
  * Doing the zeroing here guarantee we can not loop forever
  * while a wild producer attempts to flood us.
  */

but I fail to understand how this scenario can happen even with TCP.
Yes, it can throw away packets, but not until the receive buffer is full,
and then sk_add_backlog() should start rejecting new messages anyway?
There is evidently something I have missed here.

I could also claim that the underlying problem is that the generic socket
layer is built on some assumptions that may be true for TCP, but not
necessarily for everybody else. But then I guess I risk being flamed...

> 
> Why TCP has no problem, but TIPC has it ?

Because TCP can throw away packet in such situations, and TIPC cannot.
When a TIPC message has reached the socket leve, it has already passed
through the retransmission level , which in TIPC is situated in the node-to-node
links, and messages are assumed to arrive loss-free and in sequence. Hence, 
there is no sequence numbering or retransmission at the socket level in TIPC.

> 
> It seems you have too low sk_rcvbuf, or you advertise too big windows to
> senders.

Yes.

> 
> If backlog content is so big it eventually makes TIPC drops incoming packets,
> then you have a scheduling issue, or unexpected bufferbloat.

Bufferbloat, yes, but not unexpected. It all goes back to our admittedly
inadequate message based flow control. As I already explained to David
in an earlier mail,  we have an entirely new, byte-based flow control in the
pipe that will fix the buffer bloat.

But the above duplicate count will still be a issue, unless we just compensate
by setting the buffer limit to twice of what it really needs to be.

> 
> Maybe TIPC holds socket lock too long in some place ?
> 
>