lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 17 Oct 2023 11:49:04 +0800
From: Wen Gu <guwen@...ux.alibaba.com>
To: Niklas Schnelle <schnelle@...ux.ibm.com>, kgraul@...ux.ibm.com,
 wenjia@...ux.ibm.com, jaka@...ux.ibm.com, davem@...emloft.net,
 edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com
Cc: wintera@...ux.ibm.com, gbayer@...ux.ibm.com, pasic@...ux.ibm.com,
 alibuda@...ux.alibaba.com, tonylu@...ux.alibaba.com,
 dust.li@...ux.alibaba.com, linux-s390@...r.kernel.org,
 netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next v4 00/18] net/smc: implement virtual ISM
 extension and loopback-ism



On 2023/10/8 15:19, Wen Gu wrote:
> 
> 
> On 2023/10/5 16:21, Niklas Schnelle wrote:
> 
>>
>> Hi Wen Gu,
>>
>> I've been trying out your series with iperf3, qperf, and uperf on
>> s390x. I'm using network namespaces with a ConnectX VF from the same
>> card in each namespace for the initial TCP/IP connection i.e. initially
>> it goes out to a real NIC even if that can switch internally. All of
>> these look great for streaming workloads both in terms of performance
>> and stability. With a Connect-Request-Response workload and uperf
>> however I've run into issues. The test configuration I use is as
>> follows:
>>
>> Client Command:
>>
>> # host=$ip_server ip netns exec client smc_run uperf -m tcp_crr.xml
>>
>> Server Command:
>>
>> # ip netns exec server smc_run uperf -s &> /dev/null
>>
>> Uperf tcp_crr.xml:
>>
>> <?xml version="1.0"?>
>> <profile name="TCP_CRR">
>>          <group nthreads="12">
>>                  <transaction duration="120">
>>                          <flowop type="connect" options="remotehost=$host protocol=tcp" />
>>                          <flowop type="write" options="size=200"/>
>>                          <flowop type="read" options="size=1000"/>
>>                          <flowop type="disconnect" />
>>                  </transaction>
>>          </group>
>> </profile>
>>
>> The workload first runs fine but then after about 4 GB of data
>> transferred fails with "Connection refused" and "Connection reset by
>> peer" errors. The failure is not permanent however and re-running
>> the streaming workloads run fine again (with both uperf server and
>> client restarted). So I suspect something gets stuck in either the
>> client or server sockets. The same workload runs fine with TCP/IP of
>> course.
>>
>> Thanks,
>> Niklas
>>
>>
> 
> Hi Niklas,
> 
> Thank you very much for the test. With the test example you provided, I've
> reproduced the issue in my VM. And moreover, sometimes the test complains
> with 'Error saying goodbye with <ip>'
> 
> I'll figure out what's going on here.
> 
> Thanks!
> Wen Gu

I think that there is a common issue for SMC-R and SMC-D. I also reproduce
'connection reset by peer' and 'Error saying goodbye with <ip>' when using
SMC-R under the same test condition. They occur at the end of the test.

When the uperf test time ends, some signals are sent. At this point there
are usually some SMC connections doing CLC handshake. I catch some -EINTR(-4)
in client and -ECONNRESET(-104) in server returned from smc_clc_wait_msg,
(correspondingly handshake error counts also increase) and TCP RST packets
sent to terminate the CLC TCP connection(clcsock).

I am not sure if this should be considered as a bydesign or a bug of SMC.
 From an application perspective, the conn reset behavior only happens when
using SMC.

@Wenjia, could you please take a look at this?

Thanks,
Wen Gu

Powered by blists - more mailing lists