[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4A859716.7040904@myri.com>
Date: Fri, 14 Aug 2009 12:55:50 -0400
From: Andrew Gallatin <gallatin@...i.com>
To: Bill Fink <billfink@...dspring.com>
CC: netdev@...r.kernel.org
Subject: Re: Receive side performance issue with multi-10-GigE and NUMA
Bill Fink wrote:
> Hi Drew,
>
> On Fri, 14 Aug 2009, Andrew Gallatin wrote:
>
>> Hi Bill,
>>
>> A few questions. I was looking at the manual for the
>> X8DAH+-F, and it claims to support both I/OAT and DCA.
>> Do you have either or both enabled?
>
> I did not explicitly set either one, and the manual indicates they
> are both enabled by default, which I also vaguely seem to recall
> was the way they were set. I'm not in at the office today so I
> can't physically check.
>
>> If yes, then
>> what happens if you disable ioatdma (by setting
>> net.ipv4.tcp_dma_copybreak=2147483647 with sysctl)?
>> How about if you disable myri10ge's use of dca (load driver
>> with myri10ge_dca=0).
>>
>> Do you see any changes?
>
> Good suggestions but unfortunately it didn't help (or hurt).
> It may have helped a little bit on the transmit side (I saw one
> test at 102 Gbps when the previous high I had seen was 101 Gbps),
> but the receive side was still at 55 Mbps.
Darn. But it shouldn't matter at all for the transmit side...
Speaking of the send side, have you tried using
netperf -tTCP_SENDFILE rather than nuttcp to make the
transmit side zero-copy?
> Would there be any difference between disabling I/OAT and DCA in
> the BIOS versus the myri10ge module parameter and sysctl setting?
> I can try any BIOS changes on Monday.
There should not be, no.
>> I'm worried about ioatdma because I've seen problems with it
>> before. At least on Linux, it tends to busywait for the DMA
>> to complete, which is actually slower than a memory copy in
>> most cases that I've seen.
>>
>> I'm worried about DCA because you've shown that the BIOS is buggy,
>> so the tag table could be wrong (resulting in bad prefetching hints).
>
> The new BIOS seems to be better at setting the NUMA node info.
>
>> I'm also worried about DCA because I've never had the chance to
>> use it on a 5520 based system, and there is always the chance
>> that we may be doing something wrong ourselves in the NIC firmware
>> (again resulting in bad prefetching hints). Bad prefetching hints
>> can cause cross-CPU chatter, and kill performance by wasting
>> memory bandwidth, and dirtying a cache on another CPU
>> for no reason.
>
> Is there any easy way to monitor active memory bandwidth usage?
There may be something in the chipset, and there may be CPU counters,
(via oprofile) but I'm not aware of what they are. It might be
interesting to run just 1/2 your test (all to, say, NUMA node
1) and then bind some lmbench memory copy (bw_mem) processes to
NUMA node 0, and see if the lmbench slows down (and/or is slowed
down) by the ongoing network traffic.
Drew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists