lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 05 May 2010 10:22:14 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	David Miller <davem@...emloft.net>
Cc:	netdev@...r.kernel.org, hadi@...erus.ca, therbert@...gle.com
Subject: Re: [PATCH net-next-2.6] net: __alloc_skb() speedup

Le mercredi 05 mai 2010 à 01:06 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@...il.com>
> Date: Tue, 04 May 2010 19:10:54 +0200
> 
> > With following patch I can reach maximum rate of my pktgen+udpsink
> > simulator :
> > - 'old' machine : dual quad core E5450  @3.00GHz
> > - 64 UDP rx flows (only differ by destination port)
> > - RPS enabled, NIC interrupts serviced on cpu0
> > - rps dispatched on 7 other cores. (~130.000 IPI per second)
> > - SLAB allocator (faster than SLUB in this workload)
> > - tg3 NIC
> > - 1.080.000 pps without a single drop at NIC level.
> > 
> > Idea is to add two prefetchw() calls in __alloc_skb(), one to prefetch
> > first sk_buff cache line, the second to prefetch the shinfo part.
> > 
> > Also using one memset() to initialize all skb_shared_info fields instead
> > of one by one to reduce number of instructions, using long word moves.
> > 
> > All skb_shared_info fields before 'dataref' are cleared in 
> > __alloc_skb().
> > 
> > Signed-off-by: Eric Dumazet <eric.dumazet@...il.com>
> 
> I'll apply this, nice work Eric.
> 
> But some caveats...
> 
> On several cpu types it is possible to "prefetch invalidate"
> cachelines.  PowerPC and sparc64 can both do it.  I'm pretty
> sure current gen x86 have SSE bits that can do this too.
> 
> In fact, the memset() for sparc64 is going to do these cacheline
> invalidates, making the prefetches on 'skb' in fact wasteful.
> It will just create spurious bus traffic.
> 

You mean memset() wont be inlined by ompiler to plain memory writes, but
use the custom kernel memset()  ?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ