netdev - Re: [PATCH] net: add per device sg_max

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5696681D.4060002@stressinduktion.org>
Date:	Wed, 13 Jan 2016 16:07:09 +0100
From:	Hannes Frederic Sowa <hannes@...essinduktion.org>
To:	Eric Dumazet <edumazet@...gle.com>,
	Hans Westgaard Ry <hans.westgaard.ry@...cle.com>
Cc:	David Laight <David.Laight@...lab.com>,
	"David S. Miller" <davem@...emloft.net>,
	Alexey Kuznetsov <kuznet@....inr.ac.ru>,
	James Morris <jmorris@...ei.org>,
	Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
	Patrick McHardy <kaber@...sh.net>,
	Alexei Starovoitov <ast@...mgrid.com>,
	Jiri Pirko <jiri@...lanox.com>,
	Daniel Borkmann <daniel@...earbox.net>,
	Nicolas Dichtel <nicolas.dichtel@...nd.com>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Salam Noureddine <noureddine@...sta.com>,
	Jarod Wilson <jarod@...hat.com>,
	Toshiaki Makita <makita.toshiaki@....ntt.co.jp>,
	Julian Anastasov <ja@....bg>,
	Ying Xue <ying.xue@...driver.com>,
	Craig Gallek <kraig@...gle.com>,
	Mel Gorman <mgorman@...hsingularity.net>,
	Edward Jee <edjee@...gle.com>,
	Julia Lawall <julia.lawall@...6.fr>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Haakon Bugge <haakon.bugge@...cle.com>,
	Knut Omang <knut.omang@...cle.com>,
	Wei Lin Guay <wei.lin.guay@...cle.com>,
	Santosh Shilimkar <santosh.shilimkar@...cle.com>,
	Yuval Shaia <yuval.shaia@...cle.com>
Subject: Re: [PATCH] net: add per device sg_max_frags for skb

On 13.01.2016 15:19, Eric Dumazet wrote:
> 1) There are no arch with 1K page sizes. Most certainly, if we had
> MAX_SKB_FRAGS=65 some assumptions in the stack would fail.
>
> 2) TCP stack has coalescing support. write(2) or sendmsg(2) should
> append data into the last skb in write queue, and still use 32 KB
> frags.
>      You get pathological skb when using sendpage() or when one thread
> writes data into _multiple_ TCP sockets, since TCP stack uses
>      a per thread 32 KB reserve (
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5640f7685831e088fe6c2e1f863a6805962f8e81
> )
>
> 2) As I said, implementing a limit in TCP stack is not enough. Your
> patch is therefore adding complexity for all users, but is not a
> general solution.
>
>     GRO, tun device, many things can still cook 'big skbs'
>
>      You need to properly implement a fallback, possibly using
> ndo_features_check(), or directly from your ndo_start_xmit()
>
> 3) We currently have a very dumb way to fallback, forcing a linearize
> call, likely to fail if memory is fragmented and skb big.
>
>      You could instead provide a smart helper, trying to reduce the
> number of frags in a skb by chosing adjacent frags and
> re-allocating/merging them.
>
>      By choosing, I mean trying to pick smallest ones to minimize copy
> cost, to get one skb with X less fragment. (X=1 in your case ?)
>
>     I know for example that bnx2x could benefit from such a helper, as
> it has a 13 frags limits.
>     (bnx2x_pkt_req_lin(), called from bnx2x ndo_start_xmit()

As I proposed, we could globally (or per netns) limit the maximum , I 
think this would be okay and could be the best alternative to install 
slow-paths which could be hit quite constantly.

Otherwise, the fallbacks like Eric proposed them are needed. I do not 
see any other choice.

Thanks,
Hannes