lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 10 Jul 2012 19:06:27 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	David Miller <davem@...emloft.net>
Cc:	ycheng@...gle.com, dave.taht@...il.com, netdev@...r.kernel.org,
	codel@...ts.bufferbloat.net, therbert@...gle.com,
	mattmathis@...gle.com, nanditad@...gle.com, ncardwell@...gle.com,
	andrewmcgr@...il.com, Rick Jones <rick.jones2@...com>
Subject: Re: [RFC PATCH v2] tcp: TCP Small Queues

On Tue, 2012-07-10 at 17:13 +0200, Eric Dumazet wrote:
> This introduce TSQ (TCP Small Queues)
> 
> TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
> device queues), to reduce RTT and cwnd bias, part of the bufferbloat
> problem.
> 
> sk->sk_wmem_alloc not allowed to grow above a given limit,
> allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
> given time.
> 
> TSO packets are sized/capped to half the limit, so that we have two
> TSO packets in flight, allowing better bandwidth use.
> 
> As a side effect, setting the limit to 40000 automatically reduces the
> standard gso max limit (65536) to 40000/2 : It can help to reduce
> latencies of high prio packets, having smaller TSO packets.
> 
> This means we divert sock_wfree() to a tcp_wfree() handler, to
> queue/send following frames when skb_orphan() [2] is called for the
> already queued skbs.
> 
> Results on my dev machine (tg3 nic) are really impressive, using
> standard pfifo_fast, and with or without TSO/GSO. Without reduction of
> nominal bandwidth.
> 
> I no longer have 3MBytes backlogged in qdisc by a single netperf
> session, and both side socket autotuning no longer use 4 Mbytes.
> 
> As skb destructor cannot restart xmit itself ( as qdisc lock might be
> taken at this point ), we delegate the work to a tasklet. We use one
> tasklest per cpu for performance reasons.
> 
> 
> 
> [1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
> [2] skb_orphan() is usually called at TX completion time,
>   but some drivers call it in their start_xmit() handler.
>   These drivers should at least use BQL, or else a single TCP
>   session can still fill the whole NIC TX ring, since TSQ will
>   have no effect.
> 
> Not-Yet-Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> ---

By the way, Rick Jones asked me :

"Is there also any chance in service demand?"

I copy here my answer since its a very good point:

I worked on the idea of a CoDel like feedback, to have a timed limit
instead of byte limit ("allow up to 1ms" delay in qdisc/dev queue.)

But it seemed a bit complex : I would need to add skb fields to properly
track the residence time (sojourn time) of queued packets.

Alternative would be to have a per tcp socket tracking array,
but it might be expensive to search a packet in it...

With multi queue devices or bad qdiscs, we can have reordering in skb
orphanings. So the lookup can be relatively expensive.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ