lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20081230213559.GD20238@ioremap.net>
Date:	Wed, 31 Dec 2008 00:35:59 +0300
From:	Evgeniy Polyakov <zbr@...emap.net>
To:	Vladislav Bolkhovitin <vst@...b.net>
Cc:	Herbert Xu <herbert@...dor.apana.org.au>,
	Jeremy Fitzhardinge <jeremy@...p.org>,
	linux-scsi@...r.kernel.org,
	James Bottomley <James.Bottomley@...senPartnership.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	FUJITA Tomonori <fujita.tomonori@....ntt.co.jp>,
	Mike Christie <michaelc@...wisc.edu>,
	Jeff Garzik <jeff@...zik.org>,
	Boaz Harrosh <bharrosh@...asas.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, scst-devel@...ts.sourceforge.net,
	Bart Van Assche <bart.vanassche@...il.com>,
	"Nicholas A. Bellinger" <nab@...ux-iscsi.org>,
	netdev@...r.kernel.org, Rusty Russell <rusty@...tcorp.com.au>,
	David Miller <davem@...emloft.net>,
	Alexey Kuznetsov <kuznet@....inr.ac.ru>
Subject: Re: [PATCH][RFC 23/23]: Support for zero-copy TCP transmit of user space data

Hi Vlad.

On Tue, Dec 30, 2008 at 08:37:00PM +0300, Vladislav Bolkhovitin (vst@...b.net) wrote:
> Although I agree that any additional allocation is something, which 
> should be avoided, *if possible*. But you shouldn't overestimate the 
> overhead of the sk_transaction_token allocation in cases, when it would 
> be needed. At first, sk_transaction_token is quite small, so a single 
> page in the kmem cache would keep about 100 of them, hence the slow 
> allocation path would be called only once per 100 objects. Second, in 
> many cases ->sendpages() needs to allocate a new skb, so already there 
> is at least one such allocations on the fast path.

Once per 100 objects? With millions of packets per second at extreme
cases this does not scale. Even more common thousand of usual packets
per second with 1.5k mtu will show up (especially freeing actually).

Any additional overhead has to be avoided if possible, even if it looks
innocent.

BSD guys already learned this lesson with packet processing tags at
every layer.

> Actually, it doesn't look like the skb shared info destructor alone 
> can't solve the task we are solving, because we need to know not when an 
> skb transmittion finished, but when transmittion of our *set of pages* 
> finished. Hence, with skb shared info destructor we would need also to 
> invent some way to track set of pages <-> set of skbs translation (you 
> refer it as combining tag and separate destructor), which would bring 
> this solution on the entire new complexity level for no gain over the 
> sk_transaction_token solution.

You really do not need to know when transmission is over, but when remote
side acks it (or connection is reset by the timeout). There is no way to
know when transmission is over without creating own skbs and submitting
them avoiding usual tcp/ip stack machinery.

You do not need to know which skbs contain which pages, system only should
track page pointers freed at skb destruction (shared info destruction
actually) time, no matter who owns those pages (since new pages can be
added into the page and some of the old ones can be freed early).

This will be effectively the same token, but it does not mean that
everyone who needs notification will have to perform additional
allocation. Put two pointers: destructor and token and do whatever you
like if one of them is non-empty, but try to avoid unneded overhead when
it is possible.

-- 
	Evgeniy Polyakov
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ