[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <507EDC5D.4070602@redhat.com>
Date: Wed, 17 Oct 2012 13:27:09 -0300
From: Marcelo Ricardo Leitner <mleitner@...hat.com>
To: netdev <netdev@...r.kernel.org>
CC: Or Gerlitz <ogerlitz@...lanox.com>,
Doug Ledford <dledford@...hat.com>
Subject: Question about Mellanox FW reporting (incorrect) port types
Hi there,
We have a customer that is having issues bringing the 1st port up after
upgrading RHEL. You may somewhat ignore the 6.2/6.3, just consider it as
"old" and "new" please. The thing is:
- RHEL 6.2 works with warnings, it brings both ports up as ETH, as
expected, just dmesg that gives repeated:
mlx4_core 0000:05:00.0: Requested port type for port 1 is not supported
on this HCA
- RHEL 6.3 doesn't, it brings only the 2nd port up
The 1st one is tagged as IB, checked via /sys/.../mxl4_port1
NIC:
05:00.0 Network controller: Mellanox Technologies MT26438 [ConnectX VPI
PCIe 2.0 5GT/s - IB QDR / 10GigE Virtualization+] (rev b0)
05:00.0 0280: 15b3:6746 (rev b0)
Issue seen at 14 servers, different firmware revisions, including at
least 2.8.0 and 2.7.9294. We couldn't reproduce it, while using 2.7.9100.
To narrow down, I placed a debug msg at mlx4_QUERY_DEV_CAP() at 6.3 kernel:
for (i = 1; i <= dev_cap->num_ports; ++i) {
err = mlx4_cmd_box(dev, 0, mailbox->dma, i, 0,
MLX4_CMD_QUERY_PORT,
MLX4_CMD_TIME_CLASS_B,
!mlx4_is_slave(dev));
if (err)
goto out;
MLX4_GET(field, outbox, QUERY_PORT_SUPPORTED_TYPE_OFFSET);
dev_cap->supported_port_types[i] = field & 3;
dev_cap->suggested_type[i] = (field >> 3) & 1;
dev_cap->default_sense[i] = (field >> 4) & 1;
...
mlx4_dbg(dev, "Port %d type flags: %x %x %x\n", i,
dev_cap->supported_port_types[i],
dev_cap->suggested_type[i],
dev_cap->default_sense[i]);
}
This gave us:
[ 12.368187] mlx4_core 0000:05:00.0: Port 1 type flags: 1 0 0
[ 12.378232] mlx4_core 0000:05:00.0: Port 2 type flags: 2 0 0
And that's mapped to:
enum mlx4_port_type {
MLX4_PORT_TYPE_NONE = 0,
MLX4_PORT_TYPE_IB = 1,
MLX4_PORT_TYPE_ETH = 2,
MLX4_PORT_TYPE_AUTO = 3
};
So actually seems that the new driver is doing just as expected. It is
honoring what firmware is saying.
Then I checked why previous driver worked. It seems to me (now based
only on code review) that it was because of this forced sense, which was
removed in 6.3, which integrated this commit:
commit 8d0fc7b61191c9433a4f738987b89e1d962eb637
Author: Yevgeny Petrilin <yevgenyp@...lanox.co.il>
Date: Mon Dec 19 04:00:34 2011 +0000
mlx4_core: Changing link sensing logic
has the chunk:
@@ -1329,12 +1353,6 @@ static int mlx4_setup_hca(struct mlx4_dev *dev)
if (!mlx4_is_slave(dev)) {
for (port = 1; port <= dev->caps.num_ports; port++) {
- if (!mlx4_is_mfunc(dev)) {
- enum mlx4_port_type port_type = 0;
- mlx4_SENSE_PORT(dev, port, &port_type);
- if (port_type)
- dev->caps.port_type[port] =
port_type;
- }
ib_port_default_caps = 0;
err = mlx4_get_port_ib_caps(dev, port,
This code would allow changing the port type to ETH, as it was executed
after the query cap and it didn't check for supported_types before setting.
So my questions are: is it possible to the firmware report a wrong port
type like that? Is it somehow configurable by sysadmin (via fw update,
..), can we flip that byte or is it a manufacturing issue?
Any other info needed? I can't try upstream driver, but I can
cherry-pick some changes if needed/recommended.
dmesg snippet for 6.3 with debugs:
[ 10.573469] mlx4_core 0000:05:00.0: PCI INT A -> GSI 26 (level, low)
-> IRQ 26
[ 10.573509] mlx4_core 0000:05:00.0: setting latency timer to 64
[ 11.593401] mlx4_core 0000:05:00.0: FW version 2.8.000 (cmd intf rev
3), max commands 16
[ 11.606423] mlx4_core 0000:05:00.0: Catastrophic error buffer at
0x1f020, size 0x10, BAR 0
[ 11.619459] mlx4_core 0000:05:00.0: Communication vector bar:2
offset:0x800
[ 11.631071] mlx4_core 0000:05:00.0: FW size 385 KB
[ 11.640232] mlx4_core 0000:05:00.0: Clear int @ 1000, BAR 2
[ 11.651984] mlx4_core 0000:05:00.0: Mapped 26 chunks/6168 KB for FW.
[ 12.355826] mlx4_core 0000:05:00.0: BlueFlame available (reg size
512, regs/page 8)
[ 12.368187] mlx4_core 0000:05:00.0: Port 1 type flags: 1 0 0
[ 12.378232] mlx4_core 0000:05:00.0: Port 2 type flags: 2 0 0
[ 12.388158] mlx4_core 0000:05:00.0: Base MM extensions: flags
00000cc0, rsvd L_Key 00000500
[ 12.401071] mlx4_core 0000:05:00.0: Max ICM size 4294967296 MB
[ 12.411183] mlx4_core 0000:05:00.0: Max QPs: 16777216, reserved QPs:
64, entry size: 256
[ 12.423786] mlx4_core 0000:05:00.0: Max SRQs: 16777216, reserved
SRQs: 64, entry size: 128
[ 12.436568] mlx4_core 0000:05:00.0: Max CQs: 16777216, reserved CQs:
128, entry size: 128
[ 12.449241] mlx4_core 0000:05:00.0: Max EQs: 512, reserved EQs: 8,
entry size: 128
[ 12.461221] mlx4_core 0000:05:00.0: reserved MPTs: 16, reserved MTTs: 16
[ 12.472270] mlx4_core 0000:05:00.0: Max PDs: 8388608, reserved PDs:
4, reserved UARs: 2
[ 12.484711] mlx4_core 0000:05:00.0: Max QP/MCG: 8388608, reserved MGMs: 0
[ 12.495786] mlx4_core 0000:05:00.0: Max CQEs: 4194304, max WQEs:
16384, max SRQ WQEs: 16384
[ 12.508587] mlx4_core 0000:05:00.0: Local CA ACK delay: 15, max MTU:
4096, port width cap: 3
[ 12.521485] mlx4_core 0000:05:00.0: Max SQ desc size: 1008, max SQ
S/G: 62
[ 12.532639] mlx4_core 0000:05:00.0: Max RQ desc size: 512, max RQ S/G: 32
[ 12.543651] mlx4_core 0000:05:00.0: Max GSO size: 131072
[ 12.552996] mlx4_core 0000:05:00.0: Max counters: 256
[ 12.561998] mlx4_core 0000:05:00.0: DEV_CAP flags:
[ 12.570660] mlx4_core 0000:05:00.0: RC transport
[ 12.570661] mlx4_core 0000:05:00.0: UC transport
[ 12.570662] mlx4_core 0000:05:00.0: UD transport
[ 12.570662] mlx4_core 0000:05:00.0: XRC transport
[ 12.570663] mlx4_core 0000:05:00.0: FCoIB support
[ 12.570664] mlx4_core 0000:05:00.0: SRQ support
[ 12.570665] mlx4_core 0000:05:00.0: IPoIB checksum offload
[ 12.570666] mlx4_core 0000:05:00.0: P_Key violation counter
[ 12.570667] mlx4_core 0000:05:00.0: Q_Key violation counter
[ 12.570667] mlx4_core 0000:05:00.0: DPDP
[ 12.570668] mlx4_core 0000:05:00.0: Big LSO headers
[ 12.570669] mlx4_core 0000:05:00.0: APM support
[ 12.570670] mlx4_core 0000:05:00.0: Atomic ops support
[ 12.570671] mlx4_core 0000:05:00.0: Address vector port checking
support
[ 12.570672] mlx4_core 0000:05:00.0: UD multicast support
[ 12.570672] mlx4_core 0000:05:00.0: Router support
[ 12.570673] mlx4_core 0000:05:00.0: IBoE support
[ 12.570674] mlx4_core 0000:05:00.0: Unicast loopback support
[ 12.570675] mlx4_core 0000:05:00.0: Wake On LAN support
[ 12.570676] mlx4_core 0000:05:00.0: UDP RSS support
[ 12.570676] mlx4_core 0000:05:00.0: Unicast VEP steering support
[ 12.570677] mlx4_core 0000:05:00.0: Multicast VEP steering support
[ 12.570678] mlx4_core 0000:05:00.0: Counters support
[ 12.570680] mlx4_core 0000:05:00.0: Initial port 1 type: 1,
port_type_array[0]=0 <-- (this is log of mine too)
[ 12.570681] mlx4_core 0000:05:00.0: Sense allowed for port 1: 0
[ 12.570682] mlx4_core 0000:05:00.0: Initial port 2 type: 2,
port_type_array[1]=0
[ 12.570683] mlx4_core 0000:05:00.0: Sense allowed for port 2: 0
[ 12.570686] mlx4_core 0000:05:00.0: profile[ 0] ( CMPT): 2^26
entries @ 0x 0, size 0x 100000000
[ 12.570687] mlx4_core 0000:05:00.0: profile[ 1] (RDMARC): 2^22
entries @ 0x 100000000, size 0x 8000000
[ 12.570689] mlx4_core 0000:05:00.0: profile[ 2] ( QP): 2^18
entries @ 0x 108000000, size 0x 4000000
[ 12.570690] mlx4_core 0000:05:00.0: profile[ 3] ( MTT): 2^23
entries @ 0x 10c000000, size 0x 4000000
[ 12.570691] mlx4_core 0000:05:00.0: profile[ 4] ( DMPT): 2^19
entries @ 0x 110000000, size 0x 2000000
[ 12.570693] mlx4_core 0000:05:00.0: profile[ 5] ( ALTC): 2^18
entries @ 0x 112000000, size 0x 1000000
[ 12.570694] mlx4_core 0000:05:00.0: profile[ 6] ( SRQ): 2^16
entries @ 0x 113000000, size 0x 800000
[ 12.570696] mlx4_core 0000:05:00.0: profile[ 7] ( CQ): 2^16
entries @ 0x 113800000, size 0x 800000
[ 12.570697] mlx4_core 0000:05:00.0: profile[ 8] ( MCG): 2^13
entries @ 0x 114000000, size 0x 800000
[ 12.570699] mlx4_core 0000:05:00.0: profile[ 9] ( AUXC): 2^18
entries @ 0x 114800000, size 0x 40000
[ 12.570701] mlx4_core 0000:05:00.0: profile[10] ( EQ): 2^06
entries @ 0x 114840000, size 0x 2000
[ 12.570702] mlx4_core 0000:05:00.0: HCA context memory: reserving
4530440 KB
[ 12.570722] mlx4_core 0000:05:00.0: 4530440 KB of HCA context
requires 8936 KB aux memory.
[ 12.599185] mlx4_core 0000:05:00.0: Mapped 38 chunks/8936 KB for ICM aux.
[ 12.600516] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at 0 for ICM.
[ 12.601811] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
40000000 for ICM.
[ 12.603105] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
80000000 for ICM.
[ 12.603139] mlx4_core 0000:05:00.0: Mapped 1 chunks/4 KB at c0000000
for ICM.
[ 12.603192] mlx4_core 0000:05:00.0: Mapped 1 chunks/8 KB at 114840000
for ICM.
[ 12.604464] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
10c000000 for ICM.
[ 12.605772] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
110000000 for ICM.
[ 12.607047] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
108000000 for ICM.
[ 12.608324] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
114800000 for ICM.
[ 12.609600] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
112000000 for ICM.
[ 12.610875] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
100000000 for ICM.
[ 12.612146] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
113800000 for ICM.
[ 12.613419] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
113000000 for ICM.
[ 12.614693] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
114000000 for ICM.
[ 12.615966] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
114040000 for ICM.
[ 12.617240] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
114080000 for ICM.
[ 12.618512] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
1140c0000 for ICM.
[ 12.619787] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
114100000 for ICM.
[ 12.621061] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
114140000 for ICM.
[ 12.622334] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
114180000 for ICM.
[ 12.623603] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
1141c0000 for ICM.
[ 12.624880] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
114200000 for ICM.
[ 12.626154] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
114240000 for ICM.
[ 12.627426] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
114280000 for ICM.
[ 12.628699] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
1142c0000 for ICM.
[ 12.629974] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
114300000 for ICM.
[ 12.631247] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
114340000 for ICM.
[ 12.632521] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
114380000 for ICM.
[ 12.633793] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
1143c0000 for ICM.
[ 12.635069] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
114400000 for ICM.
[ 12.636342] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
114440000 for ICM.
[ 12.637616] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
114480000 for ICM.
[ 12.638890] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
1144c0000 for ICM.
[ 12.640162] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
114500000 for ICM.
[ 12.641435] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
114540000 for ICM.
[ 12.642714] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
114580000 for ICM.
[ 12.643989] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
1145c0000 for ICM.
[ 12.645265] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
114600000 for ICM.
[ 12.646536] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
114640000 for ICM.
[ 12.647807] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
114680000 for ICM.
[ 12.649082] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
1146c0000 for ICM.
[ 12.650354] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
114700000 for ICM.
[ 12.651628] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
114740000 for ICM.
[ 12.652902] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
114780000 for ICM.
[ 12.654177] mlx4_core 0000:05:00.0: Mapped 1 chunks/256 KB at
1147c0000 for ICM.
... irq allocs ...
[ 13.222583] mlx4_core 0000:05:00.0: irq 128 for MSI/MSI-X
[ 13.602288] mlx4_core 0000:05:00.0: NOP command IRQ test passed
[ 13.653457] mlx4_en: Mellanox ConnectX HCA Ethernet driver v2.0 (Dec
2011)
[ 13.662601] mlx4_en 0000:05:00.0: Activating port:2
[ 13.669411] mlx4_en: 0000:05:00.0: Port 2: Using 8 TX rings
[ 13.676497] mlx4_en: 0000:05:00.0: Port 2: Using 8 RX rings
[ 13.683772] mlx4_en: 0000:05:00.0: Port 2: Initializing port
[ 13.731168] mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April
4, 2008)
Previous kernel (I don't have it with debugs):
mlx4_core 0000:05:00.0: irq 105 for MSI/MSI-X
mlx4_en: Mellanox ConnectX HCA Ethernet driver v1.5.4.1 (March 2011)
mlx4_en 0000:05:00.0: Activating port:1
mlx4_en: 0000:05:00.0: Port 1: Using 8 TX rings
mlx4_en: 0000:05:00.0: Port 1: Using 8 RX rings
mlx4_en: 0000:05:00.0: Port 1: Initializing port
mlx4_en 0000:05:00.0: Activating port:2
mlx4_en: 0000:05:00.0: Port 2: Using 8 TX rings
mlx4_en: 0000:05:00.0: Port 2: Using 8 RX rings
mlx4_en: 0000:05:00.0: Port 2: Initializing port
Same host, same nic, just rebooted.
Thanks,
Marcelo.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists