lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20090818030718.aef199f4.billfink@mindspring.com>
Date:	Tue, 18 Aug 2009 03:07:18 -0400
From:	Bill Fink <billfink@...dspring.com>
To:	Jesse Barnes <jbarnes@...tuousgeek.org>
Cc:	"Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
	Neil Horman <nhorman@...driver.com>,
	Andrew Gallatin <gallatin@...i.com>,
	Brice Goglin <Brice.Goglin@...ia.fr>,
	Linux Network Developers <netdev@...r.kernel.org>,
	Yinghai Lu <yhlu.kernel@...il.com>
Subject: Re: Receive side performance issue with multi-10-GigE and NUMA

On Mon, 17 Aug 2009 09:53:02 -0700, Jesse Barnes wrote:

> On Fri, 14 Aug 2009 16:31:55 -0400
> Bill Fink <billfink@...dspring.com> wrote:
> 
> Hm, yeah it probably should have the full CPU mask...
> 
> > Also, is it just not possible on this type of Intel Xeon system to
> > properly associate the PCI devices with the nearest NUMA node?
> 
> All the PCI devices hang off the root complex, which is the same
> distance to each node of memory (at least that's my understanding for
> current platforms).

I admit to being confused then.  The basic system architecture
of the SuperMicro system is:

      Memory----CPU1----QPI----CPU2----Memory
                  |              |
                  |              |
                 QPI            QPI
                  |              |
                  |              |
                5520----QPI----5520
                ||||           ||||
                ||||           ||||
                ||||           ||||
                PCIe           PCIe

It doesn't appear that a given PCIe device is equidistant to the
two nodes of memory.  It's one QPI hop to the "local" (same side)
node, and two QPI hops to the "remote" (far side) node.  But then
I don't know what a root complex is, and how it fits into the
system architecture above.

> > In any event, the patch didn't help (or hurt).  The transmit
> > performance remained at ~100 Gbps while the receive performance
> > remained at 55 Gbps.
> 
> Maybe the other Jesse has some ideas here.

Any and all ideas welcome.  I even considered the idea that maybe
instead of transferring 9000 bytes of payload, perhaps it was
transferring the next higher power of 2, namely 16384, since
bc told me that 9000/16384*100 was 54.9316.  But I tried a test
today with an MTU of 8000 and it didn't make any difference.

BTW here's a diff of an "lspci -vvvxxxx" on the better receive side
performing Asus system (<) versus on the SuperMicro system (>) for
one of the Myricom 10-GigE interfaces:

[root@...ntest1 ~]# diff -bw /tmp/foo2 /tmp/foo3
1c1
< 06:00.0 Ethernet controller: MYRICOM Inc. Myri-10G Dual-Protocol NIC (rev 01)
---
> 04:00.0 Ethernet controller: MYRICOM Inc. Myri-10G Dual-Protocol NIC (rev 01)
3c3
<         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
---
>       Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+

I don't know what the ParErr- versus ParErr+ means.

5,9c5,9
<         Latency: 0, Cache Line Size: 64 bytes
<         Interrupt: pin A routed to IRQ 2277
<         Region 0: Memory at da000000 (64-bit, prefetchable) [size=16M]
<         Region 2: Memory at fa900000 (64-bit, non-prefetchable) [size=1M]
<         Expansion ROM at fa880000 [disabled] [size=512K]
---
>       Latency: 0, Cache Line Size: 256 bytes
>       Interrupt: pin A routed to IRQ 121
>       Region 0: Memory at f3000000 (64-bit, prefetchable) [size=16M]
>       Region 2: Memory at fa300000 (64-bit, non-prefetchable) [size=1M]
>       Expansion ROM at fa280000 [disabled] [size=512K]
11c11
<                 Address: 00000000fee0400c  Data: 4183
---
>               Address: 00000000fee00000  Data: 40cc
45c45
<         Capabilities: [1a8] Device Serial Number b6-be-46-ff-ff-dd-60-00
---
>       Capabilities: [1a8] Device Serial Number 88-be-46-ff-ff-dd-60-00

I don't see much difference other than a larger Cache Line Size
on the SuperMicro system.

47,48c47,48
< 00: c1 14 08 00 06 05 10 00 01 00 00 02 10 00 00 00
< 10: 0c 00 00 da 00 00 00 00 04 00 90 fa 00 00 00 00
---
> 00: c1 14 08 00 46 05 10 00 01 00 00 02 40 00 00 00
> 10: 0c 00 00 f3 00 00 00 00 04 00 30 fa 00 00 00 00
50,52c50,52
< 30: 00 00 88 fa 44 00 00 00 00 00 00 00 0b 01 00 00
< 40: 00 00 00 00 05 54 81 00 0c 40 e0 fe 00 00 00 00
< 50: 83 41 00 00 01 5c 03 00 00 20 00 64 10 a0 02 00
---
> 30: 00 00 28 fa 44 00 00 00 00 00 00 00 0e 01 00 00
> 40: 00 00 00 00 05 54 81 00 00 00 e0 fe 00 00 00 00
> 50: cc 40 00 00 01 5c 03 00 00 20 00 64 10 a0 02 00
73c73
< 1a0: 00 00 00 00 00 00 00 00 03 00 01 00 b6 be 46 ff
---
> 1a0: 00 00 00 00 00 00 00 00 03 00 01 00 88 be 46 ff

And here's part of the dmesg output on the Asus system:

myri10ge: Version 1.4.3-1.358
myri10ge 0000:06:00.0: PCI INT A -> GSI 35 (level, low) -> IRQ 35
myri10ge 0000:06:00.0: setting latency timer to 64
mtrr: type mismatch for da000000,1000000 old: write-back new: write-combining
firmware: requesting myri10ge_eth_z8e.dat
myri10ge 0000:06:00.0: Not enabling ECRC on non-root port 0000:05:02.0
firmware: requesting myri10ge_eth_z8e.dat
myri10ge 0000:06:00.0: MSI IRQ 2282, tx bndry 4096, fw myri10ge_eth_z8e.dat, WC
Disabled

And on the SuperMicro system:

myri10ge: Version 1.4.4-1.401
  alloc irq_desc for 35 on cpu 0 node 0
  alloc kstat_irqs on cpu 0 node 0
myri10ge 0000:04:00.0: PCI INT A -> GSI 35 (level, low) -> IRQ 35
myri10ge 0000:04:00.0: setting latency timer to 64
myri10ge 0000:04:00.0: firmware: requesting myri10ge_eth_z8e.dat
myri10ge 0000:04:00.0: Not enabling ECRC on non-root port 0000:03:02.0
myri10ge 0000:04:00.0: firmware: requesting myri10ge_eth_z8e.dat
  alloc irq_desc for 112 on cpu 0 node 0
  alloc kstat_irqs on cpu 0 node 0
myri10ge 0000:04:00.0: irq 112 for MSI/MSI-X
myri10ge 0000:04:00.0: MSI IRQ 112, tx bndry 4096, fw myri10ge_eth_z8e.dat, WC E
nabled
  alloc irq_desc for 24 on cpu 0 node 0
  alloc kstat_irqs on cpu 0 node 0

Interestingly, the "WC Enabled" is only indicated on the first two
10-GigE interfaces and disabled on the other ten.  For the Asus system
it indicates "WC Disabled" on all the interfaces, but also has that
earlier bit about "old: write-back new: write-combining", which doesn't
appear on the SuperMicro system (although that is using a slightly
newer version of the myri10ge driver).

						-Thanks

						-Bill
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ