netdev - Re: [PATCH net-next v2 2/8] tipc: compensate for double accounting in socket rcv buffer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <5374C044.6040802@donjonn.com>
Date:	Thu, 15 May 2014 09:25:24 -0400
From:	Jon Maloy <maloy@...jonn.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
CC:	Jon Maloy <jon.maloy@...csson.com>,
	"davem@...emloft.net" <davem@...emloft.net>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	Paul Gortmaker <paul.gortmaker@...driver.com>,
	Erik Hugne <erik.hugne@...csson.com>,
	"ying.xue@...driver.com" <ying.xue@...driver.com>,
	"tipc-discussion@...ts.sourceforge.net" 
	<tipc-discussion@...ts.sourceforge.net>
Subject: Re: [PATCH net-next v2 2/8] tipc: compensate for double accounting
 in socket rcv buffer

On 05/14/2014 03:42 PM, Eric Dumazet wrote:
> On Wed, 2014-05-14 at 15:00 -0400, Jon Maloy wrote:
>
>> This is where I don't get it.
>>
>> sk_add_backlog(limit) is (via sk_rcvqueues_full) testing for
>>
>>  (sk_backlog.len + sk_rmem_alloc) > limit
>>
>> But, if the receiving user is slow, sk_rmem_alloc will run full eventually, even if we
>> reduce sk_backlog.len with truesize of each transferred buffer, and sk_add_backlog
>> should then start throwing away packets. Why doesn't this happen?
> It definitely can happen if sender tries to send small packets, that
> have a high truesize/len ratio.
>
> $ nstat -a | egrep "TcpExtTCPBacklogDrop|IpInReceives|TcpExtTCPRcvCoalesce"
> IpInReceives                    8357544624         0.0
> TcpExtTCPBacklogDrop            13                 0.0
> TcpExtTCPRcvCoalesce            437826621          0.0
>
> You claim that "that sk_backlog.len does not
> give correct information about the buffer situation.", but really it
> does.

                                 Backlog Q     Recv Q     Total:   Reported
                                 -------------     ---------     ------    -----------
bef. release_sock:        2 kB            0 kB        2 kB      2 kB
during rel_lock:           0 kB             2 kB        2 kB     4 kB
after rel_lock:              0 kB             2 kB       2 kB      2 kB

But I guess you understand this, so you must mean something else
when you say it gives a "correct information".
To me it looks wrong.

My perception of those queues is that they really are part of the
same logical queue, whose combined size must not exceed the
configured upper socket limit, otherwise we violate the rules we
are supposed to obey to regarding memory consumption per socket.

>
> Your problem seems that you do not use the appropriate 'limit',
> or assume very tight scheduling constraints (An incoming packet has to
> be immediately consumed by receiver, otherwise following packet might be
> dropped)

Relating to the above, the 'limit' we use in TIPC  is the same both for
sk_add_backlog() and in tipc_filter_rcv(), so we can have a somewhat
deterministic behavior. But, given that the 'reported' value (the
one used by sk_add_backlog() ) is not correct at all moments, we
end up with seeing messages unnecessarily rejected. (Note that I
consistently say 'messages' here. In TIPC the units arriving in a socket
are complete, reassembled messages, not packets.)

>
> If rcvbuf_limit(sk, buf) is the limit for normal packets (sk_rmem_alloc)
> in receive queue, then you need something bigger to allow bursts.
>
>
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> index 3f9912f87d0d..fe4f37d8029a 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -1457,7 +1457,7 @@ u32 tipc_sk_rcv(struct sock *sk, struct sk_buff *buf)
>  	if (!sock_owned_by_user(sk)) {
>  		res = filter_rcv(sk, buf);
>  	} else {
> -		if (sk_add_backlog(sk, buf, rcvbuf_limit(sk, buf)))
> +		if (sk_add_backlog(sk, buf, 2 * rcvbuf_limit(sk, buf)))

This would work, but is in reality just a cruder version of what I have done.
It would temporarily violate the configured memory constraint, while my
method doesn't.

I should also say that my first idea when I realized this problem was just
to do like this. But given that our (fix) socket buffer limit is already at a crazy
level, I ddn't find it a good idea to double it once more.

Now, since we are planning to post a patch series with byte-based flow control
control in a few months, I would like to avoid ending up in the same discussions
again, and maybe have to redo things. If you are are interested, I could give you
a more thorough background to the TIPC two-level flow control, and involve you
in our discussions. But I am not sure this mailing list is the right forum for that.
Should I  cc you when we start discussing this internally? Only if you have time
and are interested, of course.

Regards
///jon

>  			res = TIPC_ERR_OVERLOAD;
>  		else
>  			res = TIPC_OK;
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html