netdev - Re: [PATCH v3] tcp: splice as many packets as possible at once

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090122090442.GB11139@ff.dom.local>
Date:	Thu, 22 Jan 2009 09:04:42 +0000
From:	Jarek Poplawski <jarkao2@...il.com>
To:	David Miller <davem@...emloft.net>
Cc:	zbr@...emap.net, herbert@...dor.apana.org.au, w@....eu,
	dada1@...mosbay.com, ben@...s.com, mingo@...e.hu,
	linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
	jens.axboe@...cle.com
Subject: Re: [PATCH v3] tcp: splice as many packets as possible at once

On Tue, Jan 20, 2009 at 09:16:16AM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@...il.com>
> Date: Tue, 20 Jan 2009 11:01:44 +0000
> 
> > On Tue, Jan 20, 2009 at 01:31:22PM +0300, Evgeniy Polyakov wrote:
> > > On Tue, Jan 20, 2009 at 10:20:53AM +0000, Jarek Poplawski (jarkao2@...il.com) wrote:
> > > > Good question! Alas I can't check this soon, but if it's really like
> > > > this, of course this needs some better idea and rework. (BTW, I'd like
> > > > to prevent here as much as possible some strange activities like 1
> > > > byte (payload) packets getting full pages without any accounting.)
> > > 
> > > I believe approach to meet all our goals is to have own network memory
> > > allocator, so that each skb could have its payload in the fragments, we
> > > would not suffer from the heavy fragmentation and power-of-two overhead
> > > for the larger MTUs, have a reserve for the OOM condition and generally
> > > do not depend on the main system behaviour.
> > 
> > 100% right! But I guess we need this current fix for -stable, and I'm
> > a bit worried about safety.
> 
> Jarek, we already have a page and offset you can use.
> 
> It's called sk_sndmsg_page but that is just the (current) name.
> Nothing prevents you from reusing it for your purposes here.

It seems this sk_sndmsg_page usage (refcounting) isn't consistent.
I used here tcp_sndmsg() way, but I think I'll go back to this question
soon.

Thanks,
Jarek P.

------------> take 3

net: Optimize memory usage when splicing from sockets.

The recent fix of data corruption when splicing from sockets uses
memory very inefficiently allocating a new page to copy each chunk of
linear part of skb. This patch uses the same page until it's full
(almost) by caching the page in sk_sndmsg_page field.

With changes from David S. Miller <davem@...emloft.net>

Signed-off-by: Jarek Poplawski <jarkao2@...il.com>

Tested-by: needed...
---

 net/core/skbuff.c |   45 +++++++++++++++++++++++++++++++++++----------
 1 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 2e5f2ca..2e64c1b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1333,14 +1333,39 @@ static void sock_spd_release(struct splice_pipe_desc *spd, unsigned int i)
 	put_page(spd->pages[i]);
 }
 
-static inline struct page *linear_to_page(struct page *page, unsigned int len,
-					  unsigned int offset)
-{
-	struct page *p = alloc_pages(GFP_KERNEL, 0);
+static inline struct page *linear_to_page(struct page *page, unsigned int *len,
+					  unsigned int *offset,
+					  struct sk_buff *skb)
+{
+	struct sock *sk = skb->sk;
+	struct page *p = sk->sk_sndmsg_page;
+	unsigned int off;
+
+	if (!p) {
+new_page:
+		p = sk->sk_sndmsg_page = alloc_pages(sk->sk_allocation, 0);
+		if (!p)
+			return NULL;
 
-	if (!p)
-		return NULL;
-	memcpy(page_address(p) + offset, page_address(page) + offset, len);
+		off = sk->sk_sndmsg_off = 0;
+		/* hold one ref to this page until it's full */
+	} else {
+		unsigned int mlen;
+
+		off = sk->sk_sndmsg_off;
+		mlen = PAGE_SIZE - off;
+		if (mlen < 64 && mlen < *len) {
+			put_page(p);
+			goto new_page;
+		}
+
+		*len = min_t(unsigned int, *len, mlen);
+	}
+
+	memcpy(page_address(p) + off, page_address(page) + *offset, *len);
+	sk->sk_sndmsg_off += *len;
+	*offset = off;
+	get_page(p);
 
 	return p;
 }
@@ -1349,21 +1374,21 @@ static inline struct page *linear_to_page(struct page *page, unsigned int len,
  * Fill page/offset/length into spd, if it can hold more pages.
  */
 static inline int spd_fill_page(struct splice_pipe_desc *spd, struct page *page,
-				unsigned int len, unsigned int offset,
+				unsigned int *len, unsigned int offset,
 				struct sk_buff *skb, int linear)
 {
 	if (unlikely(spd->nr_pages == PIPE_BUFFERS))
 		return 1;
 
 	if (linear) {
-		page = linear_to_page(page, len, offset);
+		page = linear_to_page(page, len, &offset, skb);
 		if (!page)
 			return 1;
 	} else
 		get_page(page);
 
 	spd->pages[spd->nr_pages] = page;
-	spd->partial[spd->nr_pages].len = len;
+	spd->partial[spd->nr_pages].len = *len;
 	spd->partial[spd->nr_pages].offset = offset;
 	spd->nr_pages++;
 
@@ -1405,7 +1430,7 @@ static inline int __splice_segment(struct page *page, unsigned int poff,
 		/* the linear region may spread across several pages  */
 		flen = min_t(unsigned int, flen, PAGE_SIZE - poff);
 
-		if (spd_fill_page(spd, page, flen, poff, skb, linear))
+		if (spd_fill_page(spd, page, &flen, poff, skb, linear))
 			return 1;
 
 		__segment_seek(&page, &poff, &plen, flen);
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html