[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 05 May 2010 01:06:58 -0700 (PDT)
From: David Miller <davem@...emloft.net>
To: eric.dumazet@...il.com
Cc: netdev@...r.kernel.org, hadi@...erus.ca, therbert@...gle.com
Subject: Re: [PATCH net-next-2.6] net: __alloc_skb() speedup
From: Eric Dumazet <eric.dumazet@...il.com>
Date: Tue, 04 May 2010 19:10:54 +0200
> With following patch I can reach maximum rate of my pktgen+udpsink
> simulator :
> - 'old' machine : dual quad core E5450 @3.00GHz
> - 64 UDP rx flows (only differ by destination port)
> - RPS enabled, NIC interrupts serviced on cpu0
> - rps dispatched on 7 other cores. (~130.000 IPI per second)
> - SLAB allocator (faster than SLUB in this workload)
> - tg3 NIC
> - 1.080.000 pps without a single drop at NIC level.
>
> Idea is to add two prefetchw() calls in __alloc_skb(), one to prefetch
> first sk_buff cache line, the second to prefetch the shinfo part.
>
> Also using one memset() to initialize all skb_shared_info fields instead
> of one by one to reduce number of instructions, using long word moves.
>
> All skb_shared_info fields before 'dataref' are cleared in
> __alloc_skb().
>
> Signed-off-by: Eric Dumazet <eric.dumazet@...il.com>
I'll apply this, nice work Eric.
But some caveats...
On several cpu types it is possible to "prefetch invalidate"
cachelines. PowerPC and sparc64 can both do it. I'm pretty
sure current gen x86 have SSE bits that can do this too.
In fact, the memset() for sparc64 is going to do these cacheline
invalidates, making the prefetches on 'skb' in fact wasteful.
It will just create spurious bus traffic.
The memset() for skb_shared_info() is going to help universally
I think.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists