[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <68580479-c66e-41e3-b869-b9f98e348f01@linux.ibm.com>
Date: Wed, 18 Oct 2023 21:43:40 +0200
From: Wenjia Zhang <wenjia@...ux.ibm.com>
To: Wen Gu <guwen@...ux.alibaba.com>,
Niklas Schnelle <schnelle@...ux.ibm.com>, kgraul@...ux.ibm.com,
jaka@...ux.ibm.com, davem@...emloft.net, edumazet@...gle.com,
kuba@...nel.org, pabeni@...hat.com
Cc: wintera@...ux.ibm.com, gbayer@...ux.ibm.com, pasic@...ux.ibm.com,
alibuda@...ux.alibaba.com, tonylu@...ux.alibaba.com,
dust.li@...ux.alibaba.com, linux-s390@...r.kernel.org,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next v4 00/18] net/smc: implement virtual ISM
extension and loopback-ism
On 17.10.23 05:49, Wen Gu wrote:
>
>
> On 2023/10/8 15:19, Wen Gu wrote:
>>
>>
>> On 2023/10/5 16:21, Niklas Schnelle wrote:
>>
>>>
>>> Hi Wen Gu,
>>>
>>> I've been trying out your series with iperf3, qperf, and uperf on
>>> s390x. I'm using network namespaces with a ConnectX VF from the same
>>> card in each namespace for the initial TCP/IP connection i.e. initially
>>> it goes out to a real NIC even if that can switch internally. All of
>>> these look great for streaming workloads both in terms of performance
>>> and stability. With a Connect-Request-Response workload and uperf
>>> however I've run into issues. The test configuration I use is as
>>> follows:
>>>
>>> Client Command:
>>>
>>> # host=$ip_server ip netns exec client smc_run uperf -m tcp_crr.xml
>>>
>>> Server Command:
>>>
>>> # ip netns exec server smc_run uperf -s &> /dev/null
>>>
>>> Uperf tcp_crr.xml:
>>>
>>> <?xml version="1.0"?>
>>> <profile name="TCP_CRR">
>>> <group nthreads="12">
>>> <transaction duration="120">
>>> <flowop type="connect"
>>> options="remotehost=$host protocol=tcp" />
>>> <flowop type="write" options="size=200"/>
>>> <flowop type="read" options="size=1000"/>
>>> <flowop type="disconnect" />
>>> </transaction>
>>> </group>
>>> </profile>
>>>
>>> The workload first runs fine but then after about 4 GB of data
>>> transferred fails with "Connection refused" and "Connection reset by
>>> peer" errors. The failure is not permanent however and re-running
>>> the streaming workloads run fine again (with both uperf server and
>>> client restarted). So I suspect something gets stuck in either the
>>> client or server sockets. The same workload runs fine with TCP/IP of
>>> course.
>>>
>>> Thanks,
>>> Niklas
>>>
>>>
>>
>> Hi Niklas,
>>
>> Thank you very much for the test. With the test example you provided,
>> I've
>> reproduced the issue in my VM. And moreover, sometimes the test complains
>> with 'Error saying goodbye with <ip>'
>>
>> I'll figure out what's going on here.
>>
>> Thanks!
>> Wen Gu
>
> I think that there is a common issue for SMC-R and SMC-D. I also reproduce
> 'connection reset by peer' and 'Error saying goodbye with <ip>' when using
> SMC-R under the same test condition. They occur at the end of the test.
>
> When the uperf test time ends, some signals are sent. At this point there
> are usually some SMC connections doing CLC handshake. I catch some
> -EINTR(-4)
> in client and -ECONNRESET(-104) in server returned from smc_clc_wait_msg,
> (correspondingly handshake error counts also increase) and TCP RST packets
> sent to terminate the CLC TCP connection(clcsock).
>
> I am not sure if this should be considered as a bydesign or a bug of SMC.
> From an application perspective, the conn reset behavior only happens when
> using SMC.
>
> @Wenjia, could you please take a look at this?
>
> Thanks,
> Wen Gu
Hi Wen,
Do you mean the bug in smc_clc_wait_msg()?
If yes, I can not see any problem in the smc_clc_wait_msg(). From your
description, it looks to me like the server should get the CLC_PROPOSAL
message, but nothing in it while the client is waiting for the accept
CLC_ACCEPT message from the server until the wait loops is broken out.
Thanks,
Wenjia
Powered by blists - more mailing lists