[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <65634d660911240904y294ea6fj4cf2e4ac757e619b@mail.gmail.com>
Date: Tue, 24 Nov 2009 09:04:41 -0800
From: Tom Herbert <therbert@...gle.com>
To: Linux Netdev List <netdev@...r.kernel.org>
Subject: NUMA and multiQ interation
This is a question about the expected interaction between NUMA and
receive multi queue. Our test setup is a 16 core AMD system with 4
sockets, one NUMA node per socket and a bnx2x. The test is running
500 streams in netperf RR with response/request of one byte using
net-next-2.6.
Highest throughput we are seeing is with 4 queues (1 queue processed
per socket) giving 361862 tps at 67% of cpu. 16 queues (1 queue per
cpu) gives 226722 tps at 30.43% cpu.
However, with a modified kernel that does RX skb allocations from
local node rather than the devices numa node, I'm getting 923422 tps
at 100% cpu. This is much higher tps and better cpu utilization than
the case where allocations are coming from the device numa node. It
appears that cross node allocations are a causing a significant
performance hit. For a 2.5 times performance improvement I'm kind of
motivated to revert netdev_alloc_skb to when it did not pay attention
to numa node :-)
What is the expected interaction here, and would these results be
typical? If so, would this warrant the need to associate each RX
queue to a numa node, instead of just the device?
Thanks,
Tom
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists