lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1340368430.4604.10280.camel@edumazet-glaptop>
Date:	Fri, 22 Jun 2012 14:33:50 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Alexander Duyck <alexander.duyck@...il.com>
Cc:	Alexander Duyck <alexander.h.duyck@...el.com>,
	netdev@...r.kernel.org, davem@...emloft.net,
	jeffrey.t.kirsher@...el.com
Subject: Re: [PATCH] net: Update netdev_alloc_frag to work more efficiently
 with TCP and GRO

On Thu, 2012-06-21 at 07:07 +0200, Eric Dumazet wrote:
> On Wed, 2012-06-20 at 21:07 -0700, Alexander Duyck wrote:
> > On 6/20/2012 6:21 AM, Eric Dumazet wrote:
> 
> > > +		nc->page = alloc_pages(GFP_ATOMIC | __GFP_COLD |
> > > +				       (NETDEV_FRAG_ORDER ? __GFP_COMP : 0),
> > > +				       NETDEV_FRAG_ORDER);
> 
> > I was wondering if you needed the check for NETDEV_FRAG_ORDER here.  
> >  From what I can tell setting __GFP_COMP for an order 0 page has no 
> > effect since it only seems to get checked in prep_new_page and that is 
> > after a check to verify if the page is order 0 or not.
> 
> Good point, it seems some net drivers could be changed to remove
> useless tests.
> 
> I'll post some performance data as well.

Here is the patch I tested here.

Using 32768 bytes allocations is actually nice for MTU=9000 traffic,
since we can fit 3 frames per 32KB instead of only 2 frames (using
kmalloc-16384 slab))

Also, I prefill page->_count with a high bias value, to avoid the
get_page() we did for each allocated frag.

In my profiles, the get_page() cost was dominant, because of false
sharing with skb consumers (as they might run on different cpus)

This way, when 32768 bytes are filled, we perform a single
atomic_sub_return() and can recycle the page if we find we are the last
user (this is what you did in your patch, when testing page->_count
being 1)

Note : If I used max(PAGE_SIZE, 32678) for MAX_NETDEV_FRAGSIZE,
gcc was not able to optimise get_order(MAX_NETDEV_FRAGSIZE), strange...


diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5b21522..d31efa2 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -296,9 +296,18 @@ EXPORT_SYMBOL(build_skb);
 struct netdev_alloc_cache {
 	struct page *page;
 	unsigned int offset;
+	unsigned int pagecnt_bias;
 };
 static DEFINE_PER_CPU(struct netdev_alloc_cache, netdev_alloc_cache);
 
+#if PAGE_SIZE > 32768
+#define MAX_NETDEV_FRAGSIZE	PAGE_SIZE
+#else
+#define MAX_NETDEV_FRAGSIZE	32768
+#endif
+
+#define NETDEV_PAGECNT_BIAS	(MAX_NETDEV_FRAGSIZE /		\
+				 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
 /**
  * netdev_alloc_frag - allocate a page fragment
  * @fragsz: fragment size
@@ -316,18 +325,25 @@ void *netdev_alloc_frag(unsigned int fragsz)
 	nc = &__get_cpu_var(netdev_alloc_cache);
 	if (unlikely(!nc->page)) {
 refill:
-		nc->page = alloc_page(GFP_ATOMIC | __GFP_COLD);
+		nc->page = alloc_pages(GFP_ATOMIC | __GFP_COLD | __GFP_COMP,
+				       get_order(MAX_NETDEV_FRAGSIZE));
+		if (unlikely(!nc->page))
+			goto end;
+recycle:
+		atomic_set(&nc->page->_count, NETDEV_PAGECNT_BIAS);
+		nc->pagecnt_bias = NETDEV_PAGECNT_BIAS;
 		nc->offset = 0;
 	}
-	if (likely(nc->page)) {
-		if (nc->offset + fragsz > PAGE_SIZE) {
-			put_page(nc->page);
-			goto refill;
-		}
-		data = page_address(nc->page) + nc->offset;
-		nc->offset += fragsz;
-		get_page(nc->page);
+	if (nc->offset + fragsz > MAX_NETDEV_FRAGSIZE) {
+		if (!atomic_sub_return(nc->pagecnt_bias,
+				       &nc->page->_count))
+			goto recycle;
+		goto refill;
 	}
+	data = page_address(nc->page) + nc->offset;
+	nc->offset += fragsz;
+	nc->pagecnt_bias--; /* avoid get_page()/get_page() false sharing */
+end:
 	local_irq_restore(flags);
 	return data;
 }
@@ -353,7 +369,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
 	unsigned int fragsz = SKB_DATA_ALIGN(length + NET_SKB_PAD) +
 			      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
-	if (fragsz <= PAGE_SIZE && !(gfp_mask & __GFP_WAIT)) {
+	if (fragsz <= MAX_NETDEV_FRAGSIZE && !(gfp_mask & __GFP_WAIT)) {
 		void *data = netdev_alloc_frag(fragsz);
 
 		if (likely(data)) {


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ