lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1318579509.2533.110.camel@edumazet-laptop>
Date:	Fri, 14 Oct 2011 10:05:09 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	David Miller <davem@...emloft.net>
Cc:	netdev@...r.kernel.org
Subject: Re: [PATCH net-next] tcp: reduce memory needs of out of order queue

Le vendredi 14 octobre 2011 à 03:42 -0400, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@...il.com>
> Date: Fri, 14 Oct 2011 09:19:51 +0200
> 
> > Many drivers allocates big skb to store a single TCP frame.
> > (WIFI drivers, or NIC using PAGE_SIZE fragments)
> > 
> > Its now common to get skb->truesize bigger than 4096 to store a ~1500
> > bytes TCP frame.
> > 
> > TCP sessions with large RTT and packet losses can fill their Out Of
> > Order queue with such oversized skbs, and hit their sk_rcvbuf limit,
> > starting a pruning of complete OFO queue, without giving chance to
> > receive the missing packet(s) and moving skbs from OFO to receive queue.
> > 
> > This patch adds skb_reduce_truesize() helper, and uses it for all skbs
> > queued into OFO queue.
> > 
> > Spending some time to perform a copy is worth the pain, since it permits
> > SACK processing to have a chance to complete over the RTT barrier.
> > 
> > This greatly improves user experience, without added cost on fast path.
> > 
> > Signed-off-by: Eric Dumazet <eric.dumazet@...il.com>
> 
> No objection from me, although I wish wireless drivers were able to
> size their SKBs more appropriately.  I wonder how many problems that
> look like "OMG we gotz da Buffer Bloat, arrr!" are actually due to
> this truesize issue.
> 
> I think such large truesize SKBs will cause problems even in non loss
> situations, in that the receive buffer will hit it's limits more
> quickly.  I not sure that the receive buffer autotuning is built to
> handle this sort of scenerio as a common occurance.
> 
> You might want to check if this is the actual root cause of your
> problems.  If the receive buffer autotuning doesn't expand the receive
> buffer enough to hold two windows worth of these large truesize SKBs,
> that's the real reason why we end up pruning.
> 
> We have to decide if these kinds of SKBs are acceptable as a normal
> situation for MSS sized frames.  And if they are then it's probably
> a good idea to adjust the receive buffer autotuning code too.
> 
> Although I realize it might be difficult, getting rid of these weird
> SKBs in the first place would be ideal.
> 
> It would also be a good idea to put the truesize inaccuracies into
> perspective when selecting how to fix this.  It's trying to prevent
> 1 byte packets not accounting for the 256 byte SKB and metadata.
> That kind of case with such a high ratio of wastage is important.
> 
> On the other hand, using 2048 bytes for a 1500 byte packet and claiming
> the truesize is 1500 + sizeof(metadata)... that might be an acceptable
> lie to tell :-)  This is especially true if it allows an easy solution
> to this wireless problem.
> 
> Just some thoughts...  and I wonder if the wireless thing is due to
> some hardware limitation or similar.
> 

This patch specifically addresses the OFO problem, trying to lower
memory usage for machines handling lot of sockets (proxies for example)

For the general case, I believe we have to tune/change
tcp_win_from_space() to take into account general tendancy to get fat
skbs.

sysctl_tcp_adv_win_scale is not fine enough today, and default value (2)
gives too much collapses. It's also a very complex setting, I am pretty
sure nobody knows how to use it.

tcp_win_from_space(int space) -> 75% of space  [ default ]

Only current kernels choices are to set it to one/-1 :

tcp_win_from_space(int space) -> 50% of space

or -2 :

tcp_win_from_space(int space) -> 25% of space



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ