lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 17 Oct 2011 09:02:54 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	David Miller <davem@...emloft.net>
Cc:	rick.jones2@...com, netdev@...r.kernel.org
Subject: Re: [PATCH net-next] tcp: reduce memory needs of out of order queue

Le dimanche 16 octobre 2011 à 20:53 -0400, David Miller a écrit :

> So perhaps the best solution is to divorce truesize from such driver
> and device details?  If there is one calculation, then TCP need only
> be concerned with one case.
> 
> Look at how confusing and useless tcp_adv_win_scale ends up being for
> this problem.
> 
> Therefore I'll make the mostly-serious propsal that truesize be
> something like "initial_real_total_data + sizeof(metadata)"
> 
> So if a device receives a 512 byte packet, it's:
> 
> 	512 + sizeof(metadata)
> 

That would probably OOM in stress situation, with thousand of sockets.

> It still provides the necessary protection that truesize is meant to
> provide, yet sanitizes all of the receive and send buffer overhead
> handling.
> 
> TCP should be absoultely, and completely, impervious to details like
> how buffering needs to be done for some random wireless card.  Just
> the mere fact that using a larger buffer in a driver ruins TCP
> performance indicates a serious design failure.
> 

I dont think its a design failure. Its the same problem when computing
the TCP window given the rcvspace (memory we allow to be consumed for
the socket) based on the MSS : If the sender uses 1-bytes frames only,
then receiver hit the memory limit and performance drops.

Right now our tcp-window tuning really assumes too much : perfect MSS
skb using _exactly_ MSS + sizeof(metadata), while we already know that
real slab cost is higher : 

  __roundup_pow_of_two(MSS + sizeof(struct skb_shared_info)) +
  SKB_DATA_ALIGN(sizeof(struct sk_buff))

and now with paged frag devices :

  PAGE_SIZE + SKB_DATA_ALIGN(sizeof(struct sk_buff))

We assume sender behaves correctly and drivers dont use 64KB pages to
store a single 72-bytes frame

I would say the first thing TCP stack must respect is the memory limits
that the admin set for it. Thats what skb->truesize is for.

# cat /proc/sys/net/ipv4/tcp_rmem
4096	87380	4127616

In this case, we allow up to 4Mbytes or receiver memory per session.
Not 20 or 30 Mbytes...

We must translate this to a TCP window, suitable for current hardware.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ