lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 25 Jan 2024 12:59:41 +0800
From: Wen Gu <guwen@...ux.alibaba.com>
To: Alexandra Winter <wintera@...ux.ibm.com>, wenjia@...ux.ibm.com,
 hca@...ux.ibm.com, gor@...ux.ibm.com, agordeev@...ux.ibm.com,
 davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
 pabeni@...hat.com, jaka@...ux.ibm.com,
 Matthew Rosato <mjrosato@...ux.ibm.com>
Cc: Linux regressions mailing list <regressions@...ts.linux.dev>,
 borntraeger@...ux.ibm.com, svens@...ux.ibm.com, alibuda@...ux.alibaba.com,
 tonylu@...ux.alibaba.com, raspl@...ux.ibm.com, schnelle@...ux.ibm.com,
 guangguan.wang@...ux.alibaba.com, linux-s390@...r.kernel.org,
 netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
 Halil Pasic <pasic@...ux.ibm.com>
Subject: Re: [REGRESSION] v6.8 SMC-D issues



On 2024/1/24 22:29, Alexandra Winter wrote:
> Hello Wen Gu,
> 
> our colleague Matthew reported that SMC-D is failing in certain scenarios on
> kernel v6.8 (thx Matt!). He bisected it to
> b40584d ("net/smc: compatible with 128-bits extended GID of virtual ISM device")
> I think the root cause could also be somewhere else in the SMC-Dv2.1 patchset.
> 
> I was able to reproduce the issue on a 6.8.0-rc1 kernel.
> I tested iperf over smc-d with:
> smc_run iperf3 -s
> smc_run iperf3 -c <IP@>
> 
> 1) Doing an iperf in a single system using 127.0.0.1 as IP@
> (System A=iperf client=iperf server)
> 2) Doing iperf to a remote system (System A=client; System B=iperf server)
> 
> The second iperf fails with an error message like:
> "iperf3: error - unable to receive cookie at server: Bad file descriptor" on the server"
> 
> If I do first 2) (iperf to remote) and then 1) (iperf to local), then the
> iperf to local fails.
> 
> I can do multiple iperf to the first server without problems.
> 
> I ran it on a debug server with KASAN, but got no reports in the Logfile.
> 
> I will try to debug further, but wanted to let you all know.
> 
> Kind regards
> Alexandra
> 
> Reported-by: Matthew Rosato <mjrosato@...ux.ibm.com>
> 

Hi Alexandra and Matthew,

Thank you very much for detailed description.

I tried to reproduce this with loopback-ism, cut some checks so that the remote-system
handshake can be done. After a while debug I found an elementary mistake of mine in
b40584d ("net/smc: compatible with 128-bits extended GID of virtual ISM device")..

The operator order in smcd_lgr_match() is not as expected. It will always return
'true' in remote-system case.

  static bool smcd_lgr_match(struct smc_link_group *lgr,
-                          struct smcd_dev *smcismdev, u64 peer_gid)
+                          struct smcd_dev *smcismdev,
+                          struct smcd_gid *peer_gid)
  {
-       return lgr->peer_gid == peer_gid && lgr->smcd == smcismdev;
+       return lgr->peer_gid.gid == peer_gid->gid && lgr->smcd == smcismdev &&
+               smc_ism_is_virtual(smcismdev) ?
+               (lgr->peer_gid.gid_ext == peer_gid->gid_ext) : 1;
  }

Could you please try again with this patch? to see if this is the root cause.
Really sorry for the inconvenience.

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index da6a8d9c81ea..c6a6ba56c9e3 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -1896,8 +1896,8 @@ static bool smcd_lgr_match(struct smc_link_group *lgr,
                            struct smcd_gid *peer_gid)
  {
         return lgr->peer_gid.gid == peer_gid->gid && lgr->smcd == smcismdev &&
-               smc_ism_is_virtual(smcismdev) ?
-               (lgr->peer_gid.gid_ext == peer_gid->gid_ext) : 1;
+               (smc_ism_is_virtual(smcismdev) ?
+                (lgr->peer_gid.gid_ext == peer_gid->gid_ext) : 1);
  }


Thanks,
Wen Gu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ