netdev - Re: [PATCH][RFC 23/23]: Support for zero-copy TCP transmit of user space data

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20081230213559.GD20238@ioremap.net>
Date:	Wed, 31 Dec 2008 00:35:59 +0300
From:	Evgeniy Polyakov <zbr@...emap.net>
To:	Vladislav Bolkhovitin <vst@...b.net>
Cc:	Herbert Xu <herbert@...dor.apana.org.au>,
	Jeremy Fitzhardinge <jeremy@...p.org>,
	linux-scsi@...r.kernel.org,
	James Bottomley <James.Bottomley@...senPartnership.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	FUJITA Tomonori <fujita.tomonori@....ntt.co.jp>,
	Mike Christie <michaelc@...wisc.edu>,
	Jeff Garzik <jeff@...zik.org>,
	Boaz Harrosh <bharrosh@...asas.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, scst-devel@...ts.sourceforge.net,
	Bart Van Assche <bart.vanassche@...il.com>,
	"Nicholas A. Bellinger" <nab@...ux-iscsi.org>,
	netdev@...r.kernel.org, Rusty Russell <rusty@...tcorp.com.au>,
	David Miller <davem@...emloft.net>,
	Alexey Kuznetsov <kuznet@....inr.ac.ru>
Subject: Re: [PATCH][RFC 23/23]: Support for zero-copy TCP transmit of user space data

Hi Vlad.

On Tue, Dec 30, 2008 at 08:37:00PM +0300, Vladislav Bolkhovitin (vst@...b.net) wrote:
> Although I agree that any additional allocation is something, which 
> should be avoided, *if possible*. But you shouldn't overestimate the 
> overhead of the sk_transaction_token allocation in cases, when it would 
> be needed. At first, sk_transaction_token is quite small, so a single 
> page in the kmem cache would keep about 100 of them, hence the slow 
> allocation path would be called only once per 100 objects. Second, in 
> many cases ->sendpages() needs to allocate a new skb, so already there 
> is at least one such allocations on the fast path.

Once per 100 objects? With millions of packets per second at extreme
cases this does not scale. Even more common thousand of usual packets
per second with 1.5k mtu will show up (especially freeing actually).

Any additional overhead has to be avoided if possible, even if it looks
innocent.

BSD guys already learned this lesson with packet processing tags at
every layer.

> Actually, it doesn't look like the skb shared info destructor alone 
> can't solve the task we are solving, because we need to know not when an 
> skb transmittion finished, but when transmittion of our *set of pages* 
> finished. Hence, with skb shared info destructor we would need also to 
> invent some way to track set of pages <-> set of skbs translation (you 
> refer it as combining tag and separate destructor), which would bring 
> this solution on the entire new complexity level for no gain over the 
> sk_transaction_token solution.

You really do not need to know when transmission is over, but when remote
side acks it (or connection is reset by the timeout). There is no way to
know when transmission is over without creating own skbs and submitting
them avoiding usual tcp/ip stack machinery.

You do not need to know which skbs contain which pages, system only should
track page pointers freed at skb destruction (shared info destruction
actually) time, no matter who owns those pages (since new pages can be
added into the page and some of the old ones can be freed early).

This will be effectively the same token, but it does not mean that
everyone who needs notification will have to perform additional
allocation. Put two pointers: destructor and token and do whatever you
like if one of them is non-empty, but try to avoid unneded overhead when
it is possible.

-- 
	Evgeniy Polyakov
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html