[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f49c80dc-6138-4073-b873-97f729817790@linux.dev>
Date: Mon, 6 May 2024 14:27:45 +0200
From: Zhu Yanjun <zyjzyj2000@...il.com>
To: shaozhengchao <shaozhengchao@...wei.com>,
Zhu Yanjun <zyjzyj2000@...il.com>, saeedm@...dia.com, tariqt@...dia.com,
borisp@...dia.com, shayd@...dia.com, msanalla@...dia.com,
Rahul Rameshbabu <rrameshbabu@...dia.com>, weizhang@...dia.com,
kliteyn@...dia.com, erezsh@...dia.com, igozlan@...dia.com
Cc: netdev <netdev@...r.kernel.org>, linux-rdma@...r.kernel.org
Subject: Re: [question] when bonding with CX5 network card that support ROCE
On 06.05.24 13:33, shaozhengchao wrote:
>
> Hi Yanjun:
> Thank you for your reply. Are there any other restrictions on using
> ROCE on the CX5?
https://docs.nvidia.com/networking/display/mlnxofedv571020
The above link can answer all your questions ^_^
Enjoy it.
Zhu Yanjun
>
> Zhengchao Shao
>
> On 2024/5/6 18:58, Zhu Yanjun wrote:
>>
>> On 06.05.24 12:45, shaozhengchao wrote:
>>> Hi yanjun:
>>> The following is the command output after the cat /proc/net/bonding
>>> /bond0 command is run:
>>
>> If I remember it correctly, it seems that it is a rdma LAG and bonding
>> problem.
>>
>> Not sure if it is a known problem or not. Please contact your local
>> support.
>>
>> Zhu Yanjun
>>
>>> [root@...alhost ~]# cat /proc/net/bonding/bond0
>>> Ethernet Channel Bonding Driver: v5.10.0+
>>>
>>> Bonding Mode: IEEE 802.3ad Dynamic link aggregation
>>> Transmit Hash Policy: layer2 (0)
>>> MII Status: up
>>> MII Polling Interval (ms): 100
>>> Up Delay (ms): 0
>>> Down Delay (ms): 0
>>> Peer Notification Delay (ms): 0
>>>
>>> 802.3ad info
>>> LACP rate: slow
>>> Min links: 0
>>> Aggregator selection policy (ad_select): stable
>>> System priority: 65535
>>> System MAC address: f4:1d:6b:6f:3b:97
>>> Active Aggregator Info:
>>> Aggregator ID: 2
>>> Number of ports: 1
>>> Actor Key: 23
>>> Partner Key: 1
>>> Partner Mac Address: 00:00:00:00:00:00
>>>
>>> Slave Interface: enp145s0f0
>>> MII Status: up
>>> Speed: 40000 Mbps
>>> Duplex: full
>>> Link Failure Count: 1
>>> Permanent HW addr: f4:1d:6b:6f:3b:97
>>> Slave queue ID: 0
>>> Aggregator ID: 1
>>> Actor Churn State: churned
>>> Partner Churn State: churned
>>> Actor Churned Count: 1
>>> Partner Churned Count: 2
>>> details actor lacp pdu:
>>> system priority: 65535
>>> system mac address: f4:1d:6b:6f:3b:97
>>> port key: 23
>>> port priority: 255
>>> port number: 1
>>> port state: 69
>>> details partner lacp pdu:
>>> system priority: 65535
>>> system mac address: 00:00:00:00:00:00
>>> oper key: 1
>>> port priority: 255
>>> port number: 1
>>> port state: 1
>>>
>>> Slave Interface: enp145s0f1
>>> MII Status: up
>>> Speed: 40000 Mbps
>>> Duplex: full
>>> Link Failure Count: 0
>>> Permanent HW addr: f4:1d:6b:6f:3b:98
>>> Slave queue ID: 0
>>> Aggregator ID: 2
>>> Actor Churn State: none
>>> Partner Churn State: churned
>>> Actor Churned Count: 0
>>> Partner Churned Count: 1
>>> details actor lacp pdu:
>>> system priority: 65535
>>> system mac address: f4:1d:6b:6f:3b:97
>>> port key: 23
>>> port priority: 255
>>> port number: 2
>>> port state: 77
>>> details partner lacp pdu:
>>> system priority: 65535
>>> system mac address: 00:00:00:00:00:00
>>> oper key: 1
>>> port priority: 255
>>> port number: 1
>>> port state: 1
>>>
>>> Thank you
>>> Zhengchao Shao
>>>
>>>
>>> On 2024/5/6 16:26, Zhu Yanjun wrote:
>>>> On 06.05.24 06:46, shaozhengchao wrote:
>>>>>
>>>>> When using the 5.10 kernel, I can find two IB devices using the
>>>>> ibv_devinfo command.
>>>>> ----------------------------------
>>>>> [root@...alhost ~]# lspci
>>>>> 91:00.0 Ethernet controller: Mellanox Technologies MT27800 Family
>>>>> [ConnectX-5]
>>>>> 91:00.1 Ethernet controller: Mellanox Technologies MT27800 Family
>>>>> ----------------------------------
>>>>> [root@...alhost ~]# ibv_devinfo
>>>>> hca_id: mlx5_0
>>>>> transport: InfiniBand (0)
>>>>> fw_ver: 16.31.1014
>>>>> node_guid: f41d:6b03:006f:4743
>>>>> sys_image_guid: f41d:6b03:006f:4743
>>>>> vendor_id: 0x02c9
>>>>> vendor_part_id: 4119
>>>>> hw_ver: 0x0
>>>>> board_id: HUA0000000004
>>>>> phys_port_cnt: 1
>>>>> port: 1
>>>>> state: PORT_ACTIVE (4)
>>>>> max_mtu: 4096 (5)
>>>>> active_mtu: 1024 (3)
>>>>> sm_lid: 0
>>>>> port_lid: 0
>>>>> port_lmc: 0x00
>>>>> link_layer: Ethernet
>>>>>
>>>>> hca_id: mlx5_1
>>>>> transport: InfiniBand (0)
>>>>> fw_ver: 16.31.1014
>>>>> node_guid: f41d:6b03:006f:4744
>>>>> sys_image_guid: f41d:6b03:006f:4743
>>>>> vendor_id: 0x02c9
>>>>> vendor_part_id: 4119
>>>>> hw_ver: 0x0
>>>>> board_id: HUA0000000004
>>>>> phys_port_cnt: 1
>>>>> port: 1
>>>>> state: PORT_ACTIVE (4)
>>>>> max_mtu: 4096 (5)
>>>>> active_mtu: 1024 (3)
>>>>> sm_lid: 0
>>>>> port_lid: 0
>>>>> port_lmc: 0x00
>>>>> link_layer: Ethernet
>>>>> ----------------------------------
>>>>> But after the two network ports are bonded, only one IB device is
>>>>> available, and only PF0 can be used.
>>>>> [root@...alhost shaozhengchao]# ibv_devinfo
>>>>> hca_id: mlx5_bond_0
>>>>> transport: InfiniBand (0)
>>>>> fw_ver: 16.31.1014
>>>>> node_guid: f41d:6b03:006f:4743
>>>>> sys_image_guid: f41d:6b03:006f:4743
>>>>> vendor_id: 0x02c9
>>>>> vendor_part_id: 4119
>>>>> hw_ver: 0x0
>>>>> board_id: HUA0000000004
>>>>> phys_port_cnt: 1
>>>>> port: 1
>>>>> state: PORT_ACTIVE (4)
>>>>> max_mtu: 4096 (5)
>>>>> active_mtu: 1024 (3)
>>>>> sm_lid: 0
>>>>> port_lid: 0
>>>>> port_lmc: 0x00
>>>>> link_layer: Ethernet
>>>>>
>>>>> The current Linux mainline driver is the same.
>>>>>
>>>>> I found the comment ("If bonded, we do not add an IB device for PF1.")
>>>>> in the mlx5_lag_intf_add function of the 5.10 branch driver code.
>>>>
>>>> Not sure if rdma lag is enabled for this or not. /proc/net/bonding
>>>> will provide more more details normally.
>>>>
>>>> Zhu Yanjun
>>>>
>>>>> This indicates that wthe the same NIC is used, only PF0 support
>>>>> bonding?
>>>>> Are there any other constraints, when enable bonding with CX5?
>>>>>
>>>>> Thank you
>>>>> Zhengchao Shao
>>>>
Powered by blists - more mailing lists