[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d345b292-e5a1-a428-f5e1-74a6c0c390d9@huawei.com>
Date: Mon, 6 May 2024 19:33:00 +0800
From: shaozhengchao <shaozhengchao@...wei.com>
To: Zhu Yanjun <zyjzyj2000@...il.com>, <saeedm@...dia.com>,
<tariqt@...dia.com>, <borisp@...dia.com>, <shayd@...dia.com>,
<msanalla@...dia.com>, Rahul Rameshbabu <rrameshbabu@...dia.com>,
<weizhang@...dia.com>, <kliteyn@...dia.com>, <erezsh@...dia.com>,
<igozlan@...dia.com>
CC: netdev <netdev@...r.kernel.org>, <linux-rdma@...r.kernel.org>
Subject: Re: [question] when bonding with CX5 network card that support ROCE
Hi Yanjun:
Thank you for your reply. Are there any other restrictions on using
ROCE on the CX5?
Zhengchao Shao
On 2024/5/6 18:58, Zhu Yanjun wrote:
>
> On 06.05.24 12:45, shaozhengchao wrote:
>> Hi yanjun:
>> The following is the command output after the cat /proc/net/bonding
>> /bond0 command is run:
>
> If I remember it correctly, it seems that it is a rdma LAG and bonding
> problem.
>
> Not sure if it is a known problem or not. Please contact your local
> support.
>
> Zhu Yanjun
>
>> [root@...alhost ~]# cat /proc/net/bonding/bond0
>> Ethernet Channel Bonding Driver: v5.10.0+
>>
>> Bonding Mode: IEEE 802.3ad Dynamic link aggregation
>> Transmit Hash Policy: layer2 (0)
>> MII Status: up
>> MII Polling Interval (ms): 100
>> Up Delay (ms): 0
>> Down Delay (ms): 0
>> Peer Notification Delay (ms): 0
>>
>> 802.3ad info
>> LACP rate: slow
>> Min links: 0
>> Aggregator selection policy (ad_select): stable
>> System priority: 65535
>> System MAC address: f4:1d:6b:6f:3b:97
>> Active Aggregator Info:
>> Aggregator ID: 2
>> Number of ports: 1
>> Actor Key: 23
>> Partner Key: 1
>> Partner Mac Address: 00:00:00:00:00:00
>>
>> Slave Interface: enp145s0f0
>> MII Status: up
>> Speed: 40000 Mbps
>> Duplex: full
>> Link Failure Count: 1
>> Permanent HW addr: f4:1d:6b:6f:3b:97
>> Slave queue ID: 0
>> Aggregator ID: 1
>> Actor Churn State: churned
>> Partner Churn State: churned
>> Actor Churned Count: 1
>> Partner Churned Count: 2
>> details actor lacp pdu:
>> system priority: 65535
>> system mac address: f4:1d:6b:6f:3b:97
>> port key: 23
>> port priority: 255
>> port number: 1
>> port state: 69
>> details partner lacp pdu:
>> system priority: 65535
>> system mac address: 00:00:00:00:00:00
>> oper key: 1
>> port priority: 255
>> port number: 1
>> port state: 1
>>
>> Slave Interface: enp145s0f1
>> MII Status: up
>> Speed: 40000 Mbps
>> Duplex: full
>> Link Failure Count: 0
>> Permanent HW addr: f4:1d:6b:6f:3b:98
>> Slave queue ID: 0
>> Aggregator ID: 2
>> Actor Churn State: none
>> Partner Churn State: churned
>> Actor Churned Count: 0
>> Partner Churned Count: 1
>> details actor lacp pdu:
>> system priority: 65535
>> system mac address: f4:1d:6b:6f:3b:97
>> port key: 23
>> port priority: 255
>> port number: 2
>> port state: 77
>> details partner lacp pdu:
>> system priority: 65535
>> system mac address: 00:00:00:00:00:00
>> oper key: 1
>> port priority: 255
>> port number: 1
>> port state: 1
>>
>> Thank you
>> Zhengchao Shao
>>
>>
>> On 2024/5/6 16:26, Zhu Yanjun wrote:
>>> On 06.05.24 06:46, shaozhengchao wrote:
>>>>
>>>> When using the 5.10 kernel, I can find two IB devices using the
>>>> ibv_devinfo command.
>>>> ----------------------------------
>>>> [root@...alhost ~]# lspci
>>>> 91:00.0 Ethernet controller: Mellanox Technologies MT27800 Family
>>>> [ConnectX-5]
>>>> 91:00.1 Ethernet controller: Mellanox Technologies MT27800 Family
>>>> ----------------------------------
>>>> [root@...alhost ~]# ibv_devinfo
>>>> hca_id: mlx5_0
>>>> transport: InfiniBand (0)
>>>> fw_ver: 16.31.1014
>>>> node_guid: f41d:6b03:006f:4743
>>>> sys_image_guid: f41d:6b03:006f:4743
>>>> vendor_id: 0x02c9
>>>> vendor_part_id: 4119
>>>> hw_ver: 0x0
>>>> board_id: HUA0000000004
>>>> phys_port_cnt: 1
>>>> port: 1
>>>> state: PORT_ACTIVE (4)
>>>> max_mtu: 4096 (5)
>>>> active_mtu: 1024 (3)
>>>> sm_lid: 0
>>>> port_lid: 0
>>>> port_lmc: 0x00
>>>> link_layer: Ethernet
>>>>
>>>> hca_id: mlx5_1
>>>> transport: InfiniBand (0)
>>>> fw_ver: 16.31.1014
>>>> node_guid: f41d:6b03:006f:4744
>>>> sys_image_guid: f41d:6b03:006f:4743
>>>> vendor_id: 0x02c9
>>>> vendor_part_id: 4119
>>>> hw_ver: 0x0
>>>> board_id: HUA0000000004
>>>> phys_port_cnt: 1
>>>> port: 1
>>>> state: PORT_ACTIVE (4)
>>>> max_mtu: 4096 (5)
>>>> active_mtu: 1024 (3)
>>>> sm_lid: 0
>>>> port_lid: 0
>>>> port_lmc: 0x00
>>>> link_layer: Ethernet
>>>> ----------------------------------
>>>> But after the two network ports are bonded, only one IB device is
>>>> available, and only PF0 can be used.
>>>> [root@...alhost shaozhengchao]# ibv_devinfo
>>>> hca_id: mlx5_bond_0
>>>> transport: InfiniBand (0)
>>>> fw_ver: 16.31.1014
>>>> node_guid: f41d:6b03:006f:4743
>>>> sys_image_guid: f41d:6b03:006f:4743
>>>> vendor_id: 0x02c9
>>>> vendor_part_id: 4119
>>>> hw_ver: 0x0
>>>> board_id: HUA0000000004
>>>> phys_port_cnt: 1
>>>> port: 1
>>>> state: PORT_ACTIVE (4)
>>>> max_mtu: 4096 (5)
>>>> active_mtu: 1024 (3)
>>>> sm_lid: 0
>>>> port_lid: 0
>>>> port_lmc: 0x00
>>>> link_layer: Ethernet
>>>>
>>>> The current Linux mainline driver is the same.
>>>>
>>>> I found the comment ("If bonded, we do not add an IB device for PF1.")
>>>> in the mlx5_lag_intf_add function of the 5.10 branch driver code.
>>>
>>> Not sure if rdma lag is enabled for this or not. /proc/net/bonding
>>> will provide more more details normally.
>>>
>>> Zhu Yanjun
>>>
>>>> This indicates that wthe the same NIC is used, only PF0 support
>>>> bonding?
>>>> Are there any other constraints, when enable bonding with CX5?
>>>>
>>>> Thank you
>>>> Zhengchao Shao
>>>
Powered by blists - more mailing lists