lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <38c8a10a-339f-402e-836b-baf38994c7b2@linux.alibaba.com>
Date: Tue, 21 May 2024 18:52:26 +0800
From: Guangguan Wang <guangguan.wang@...ux.alibaba.com>
To: Wenjia Zhang <wenjia@...ux.ibm.com>, jaka@...ux.ibm.com,
 kgraul@...ux.ibm.com
Cc: linux-s390@...r.kernel.org, netdev@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: some questions about restrictions in SMC-R v2's implementation



On 2024/5/17 15:41, Wenjia Zhang wrote:
> 
> 
> On 07.05.24 07:54, Guangdong Wang wrote:
>> Hi, Wenjia and Jan,
>>
>> When testing SMC-R v2, I found some scenarios where SMC-R v2 should be worked, but due to some restrictions in SMC-R v2's implementation,
>> fallback happened. I want to know why these restrictions exist and what would happen if these restrictions were removed.
>>
> 
> Hi Guangguan and Wen,
> 
> please see my answer below.
>> The first is in the function smc_ib_determine_gid_rcu, where restricts the subnet matching between smcrv2->saddr and the RDMA related netdev.
>> ...
>>
> The purpose of the restriction is to simplify the IP routing topology allowing IP routing to use the destination host's subnet route. Because each host must also have a valid IP route to the peer’s RoCE IP address to create RC QP. If the IP route used is the same IP Route as the associated TCP/IP connection, the reuse of the IP routing topology could be achieved. I think it is what the following sentence means in the doc https://www.ibm.com/support/pages/system/files/inline-files/IBM%20Shared%20Memory%20Communications%20Version%202_2.pdf
> 
> "
> For HA, multiple RoCE adapters should be provisioned along with multiple equal cost IP routes to the peer host (i.e., reusing the TCP/IP routing topology).
> "
> And the "Figure 19. SMC-Rv2 with RoCEv2 Connectivity" in the doc also mentions the restriction.
> 
> The SMCRv2 on linux is indeed implemented with this purpose. Please see the function smc_ib_modify_qp_rtr(). During the first contact processing, the Mac address of the next hop IP address for the IP route is resolved by performing e.g. ARP and used to create the RoCEv2 RC QP. If the route is not usable for the RoCE IP address to reach the peer's RoCE IP address i.e. without this restriction, the UDP/IP packets would not be transported in a right way.
> 

Hi, Wenjia

Thanks for the answer. 

I am clear about the restriction of subnet matching.

> BTW, the fallback would still happen without the restriction. Because at the end of the CLC handshake(TCP/IP traffic), the first link will be created by sending and receiving LLC confirm message (SMCRv2 traffic). If one peer can just send but not receive the LLC confirm message, he will send CLC decline message with the reason "Time Out".
> 
> Now let's have a look at your examples above. Both of your RDMA related device have another IP route as the TCP/IP connection, so that the reuse of the IP routing topology is not possible.
> 
> Any thought still?
> 
>> The other is in the function smc_connect_rdma_v2_prepare, where restricts the symmetric configuration of routing between client and server. codes here:
>> ...
>> In my testing environment, server's ip is 192.168.0.3/24, client's ip 192.168.0.4/24, regarding how many netdev in server or client. Server has special
>> route setting due to some other reasons, which results in indirect route from 192.168.0.3/24 to 192.168.0.4/24. Thus, when CLC handshake, client will
>> get fce->v2_direct==false, but client has no special routing setting and will find direct route from 192.168.0.4/24 to 192.168.0.3/24. Due to the above
>> symmetric configuration of routing restriction, we got fallback connection, rsn is 0x030f0000. But I think SMC-R should work in this scenario.
>> And more, why check the symmetric configuration of routing only when server is indirect route?
>>
> That is to check if the IP routing topology is the same on both sides. Then I'd like to ask why you use asymmetric routing for your connection? From the perspective of Networking set up, does it make any sense that the peers communicate with each other with different IP routing topology?

I have looked into the configuration of my testing environment's routing table and found that the configuration can be optimized.
And the sketch in the attachment used to describe the topology and route configuration of my testing environment.
After optimizing the route setting, the fallback disappear.

But why check the symmetric configuration of routing only when server is indirect route is still not clear.


Thanks,
Guangguan Wang
Download attachment "route configurations.jpg" of type "image/jpeg" (189551 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ