[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5696681D.4060002@stressinduktion.org>
Date: Wed, 13 Jan 2016 16:07:09 +0100
From: Hannes Frederic Sowa <hannes@...essinduktion.org>
To: Eric Dumazet <edumazet@...gle.com>,
Hans Westgaard Ry <hans.westgaard.ry@...cle.com>
Cc: David Laight <David.Laight@...lab.com>,
"David S. Miller" <davem@...emloft.net>,
Alexey Kuznetsov <kuznet@....inr.ac.ru>,
James Morris <jmorris@...ei.org>,
Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
Patrick McHardy <kaber@...sh.net>,
Alexei Starovoitov <ast@...mgrid.com>,
Jiri Pirko <jiri@...lanox.com>,
Daniel Borkmann <daniel@...earbox.net>,
Nicolas Dichtel <nicolas.dichtel@...nd.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Salam Noureddine <noureddine@...sta.com>,
Jarod Wilson <jarod@...hat.com>,
Toshiaki Makita <makita.toshiaki@....ntt.co.jp>,
Julian Anastasov <ja@....bg>,
Ying Xue <ying.xue@...driver.com>,
Craig Gallek <kraig@...gle.com>,
Mel Gorman <mgorman@...hsingularity.net>,
Edward Jee <edjee@...gle.com>,
Julia Lawall <julia.lawall@...6.fr>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Haakon Bugge <haakon.bugge@...cle.com>,
Knut Omang <knut.omang@...cle.com>,
Wei Lin Guay <wei.lin.guay@...cle.com>,
Santosh Shilimkar <santosh.shilimkar@...cle.com>,
Yuval Shaia <yuval.shaia@...cle.com>
Subject: Re: [PATCH] net: add per device sg_max_frags for skb
On 13.01.2016 15:19, Eric Dumazet wrote:
> 1) There are no arch with 1K page sizes. Most certainly, if we had
> MAX_SKB_FRAGS=65 some assumptions in the stack would fail.
>
> 2) TCP stack has coalescing support. write(2) or sendmsg(2) should
> append data into the last skb in write queue, and still use 32 KB
> frags.
> You get pathological skb when using sendpage() or when one thread
> writes data into _multiple_ TCP sockets, since TCP stack uses
> a per thread 32 KB reserve (
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5640f7685831e088fe6c2e1f863a6805962f8e81
> )
>
> 2) As I said, implementing a limit in TCP stack is not enough. Your
> patch is therefore adding complexity for all users, but is not a
> general solution.
>
> GRO, tun device, many things can still cook 'big skbs'
>
> You need to properly implement a fallback, possibly using
> ndo_features_check(), or directly from your ndo_start_xmit()
>
> 3) We currently have a very dumb way to fallback, forcing a linearize
> call, likely to fail if memory is fragmented and skb big.
>
> You could instead provide a smart helper, trying to reduce the
> number of frags in a skb by chosing adjacent frags and
> re-allocating/merging them.
>
> By choosing, I mean trying to pick smallest ones to minimize copy
> cost, to get one skb with X less fragment. (X=1 in your case ?)
>
> I know for example that bnx2x could benefit from such a helper, as
> it has a 13 frags limits.
> (bnx2x_pkt_req_lin(), called from bnx2x ndo_start_xmit()
As I proposed, we could globally (or per netns) limit the maximum , I
think this would be okay and could be the best alternative to install
slow-paths which could be hit quite constantly.
Otherwise, the fallbacks like Eric proposed them are needed. I do not
see any other choice.
Thanks,
Hannes
Powered by blists - more mailing lists