[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 17 Oct 2011 09:02:54 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: David Miller <davem@...emloft.net>
Cc: rick.jones2@...com, netdev@...r.kernel.org
Subject: Re: [PATCH net-next] tcp: reduce memory needs of out of order queue
Le dimanche 16 octobre 2011 à 20:53 -0400, David Miller a écrit :
> So perhaps the best solution is to divorce truesize from such driver
> and device details? If there is one calculation, then TCP need only
> be concerned with one case.
>
> Look at how confusing and useless tcp_adv_win_scale ends up being for
> this problem.
>
> Therefore I'll make the mostly-serious propsal that truesize be
> something like "initial_real_total_data + sizeof(metadata)"
>
> So if a device receives a 512 byte packet, it's:
>
> 512 + sizeof(metadata)
>
That would probably OOM in stress situation, with thousand of sockets.
> It still provides the necessary protection that truesize is meant to
> provide, yet sanitizes all of the receive and send buffer overhead
> handling.
>
> TCP should be absoultely, and completely, impervious to details like
> how buffering needs to be done for some random wireless card. Just
> the mere fact that using a larger buffer in a driver ruins TCP
> performance indicates a serious design failure.
>
I dont think its a design failure. Its the same problem when computing
the TCP window given the rcvspace (memory we allow to be consumed for
the socket) based on the MSS : If the sender uses 1-bytes frames only,
then receiver hit the memory limit and performance drops.
Right now our tcp-window tuning really assumes too much : perfect MSS
skb using _exactly_ MSS + sizeof(metadata), while we already know that
real slab cost is higher :
__roundup_pow_of_two(MSS + sizeof(struct skb_shared_info)) +
SKB_DATA_ALIGN(sizeof(struct sk_buff))
and now with paged frag devices :
PAGE_SIZE + SKB_DATA_ALIGN(sizeof(struct sk_buff))
We assume sender behaves correctly and drivers dont use 64KB pages to
store a single 72-bytes frame
I would say the first thing TCP stack must respect is the memory limits
that the admin set for it. Thats what skb->truesize is for.
# cat /proc/sys/net/ipv4/tcp_rmem
4096 87380 4127616
In this case, we allow up to 4Mbytes or receiver memory per session.
Not 20 or 30 Mbytes...
We must translate this to a TCP window, suitable for current hardware.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists