lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 10 Nov 2011 16:27:58 +0100
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	eilong@...adcom.com
Cc:	David Miller <davem@...emloft.net>,
	"bhutchings@...arflare.com" <bhutchings@...arflare.com>,
	"pstaszewski@...are.pl" <pstaszewski@...are.pl>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next] bnx2x: reduce skb truesize by 50%

Le jeudi 10 novembre 2011 à 17:05 +0200, Eilon Greenstein a écrit :
> On Wed, 2011-11-09 at 16:29 -0800, Eric Dumazet wrote:
> > Le mercredi 09 novembre 2011 à 23:03 +0100, Eric Dumazet a écrit :
> > 
> > > BTW, on my bnx2x adapter, even small UDP frames use more than PAGE_SIZE
> > > bytes :
> > > 
> > > skb->truesize=4352 len=26 (payload only)
> > > 
> > 
> > > I wonder if we shouldnt increase SK_MEM_QUANTUM a bit to avoid
> > > ping/pong...
> > > 
> > > -#define SK_MEM_QUANTUM ((int)PAGE_SIZE)
> > > +#define SK_MEM_QUANTUM ((int)PAGE_SIZE * 2)
> > > 
> > 
> > Following patch also helps a lot, even with only two cpus (one handling
> > device interrupts, one running the application thread)
> > 
> > [PATCH net-next] bnx2x: reduce skb truesize by ~50%
> > 
> > bnx2x uses following formula to compute its rx_buf_sz :
> > 
> > dev->mtu + 2*L1_CACHE_BYTES + 14 + 8 + 8
> > 
> > Then core network adds NET_SKB_PAD and SKB_DATA_ALIGN(sizeof(struct
> > skb_shared_info))
> > 
> > Final allocated size for skb head on x86_64 (L1_CACHE_BYTES = 64,
> > MTU=1500) : 2112 bytes : SLUB/SLAB round this to 4096 bytes.
> > 
> > Since skb truesize is then bigger than SK_MEM_QUANTUM, we have lot of
> > false sharing because of mem_reclaim in UDP stack.
> > 
> > One possible way to half truesize is to lower the need by 64 bytes (2112
> > -> 2048 bytes)
> > 
> > This way, skb->truesize is lower than SK_MEM_QUANTUM and we get better
> > performance.
> > 
> > (760.000 pps on a rx UDP monothread benchmark, instead of 720.000 pps)
> > 
> > 
> > Signed-off-by: Eric Dumazet <eric.dumazet@...il.com>
> > CC: Eilon Greenstein <eilong@...adcom.com>
> > ---
> >  drivers/net/ethernet/broadcom/bnx2x/bnx2x.h |   11 ++++++++---
> >  1 file changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
> > index aec7212..ebbdc55 100644
> > --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
> > +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
> > @@ -1185,9 +1185,14 @@ struct bnx2x {
> >  #define ETH_MAX_PACKET_SIZE		1500
> >  #define ETH_MAX_JUMBO_PACKET_SIZE	9600
> >  
> > -	/* Max supported alignment is 256 (8 shift) */
> > -#define BNX2X_RX_ALIGN_SHIFT		((L1_CACHE_SHIFT < 8) ? \
> > -					 L1_CACHE_SHIFT : 8)
> > +/* Max supported alignment is 256 (8 shift)
> > + * It should ideally be min(L1_CACHE_SHIFT, 8)
> > + * Choosing 5 (32 bytes) permits to get skb heads of 2048 bytes
> > + * instead of 4096 bytes.
> > + * With SLUB/SLAB allocators, data will be cache line aligned anyway.
> > + */
> > +#define BNX2X_RX_ALIGN_SHIFT		5
> > +
> 
> Hi Eric,
> 
> This can seriously hurt the PCI utilization. So in scenarios in which
> the PCI is the bottle neck, you will see performance degradation. We are
> looking at alternatives to reduce the allocation, but it is taking a
> while. Please hold off with this patch.

What do you mean exactly ?

This patch doesnt change skb->data alignment, its still 64 bytes
aligned. (cqe_fp->placement_offset == 2). PCI utilization is the same.

Only SLOB could get a misalignement, but who uses SLOB for performance ?

Alternative would be to check why hardware need 2*L1_CACHE_BYTES extra
room for alignment... Normaly it could be 1*L1_CACHE_BYTES ?

 	/* FW use 2 Cache lines Alignment for start packet and size  */
-#define BNX2X_FW_RX_ALIGN              (2 << BNX2X_RX_ALIGN_SHIFT)
+#define BNX2X_FW_RX_ALIGN              (1 << BNX2X_RX_ALIGN_SHIFT)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ