[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <530afe45-ba6b-4970-a71c-1f1255f5fca9@linux.alibaba.com>
Date: Thu, 25 Jan 2024 12:59:41 +0800
From: Wen Gu <guwen@...ux.alibaba.com>
To: Alexandra Winter <wintera@...ux.ibm.com>, wenjia@...ux.ibm.com,
hca@...ux.ibm.com, gor@...ux.ibm.com, agordeev@...ux.ibm.com,
davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
pabeni@...hat.com, jaka@...ux.ibm.com,
Matthew Rosato <mjrosato@...ux.ibm.com>
Cc: Linux regressions mailing list <regressions@...ts.linux.dev>,
borntraeger@...ux.ibm.com, svens@...ux.ibm.com, alibuda@...ux.alibaba.com,
tonylu@...ux.alibaba.com, raspl@...ux.ibm.com, schnelle@...ux.ibm.com,
guangguan.wang@...ux.alibaba.com, linux-s390@...r.kernel.org,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
Halil Pasic <pasic@...ux.ibm.com>
Subject: Re: [REGRESSION] v6.8 SMC-D issues
On 2024/1/24 22:29, Alexandra Winter wrote:
> Hello Wen Gu,
>
> our colleague Matthew reported that SMC-D is failing in certain scenarios on
> kernel v6.8 (thx Matt!). He bisected it to
> b40584d ("net/smc: compatible with 128-bits extended GID of virtual ISM device")
> I think the root cause could also be somewhere else in the SMC-Dv2.1 patchset.
>
> I was able to reproduce the issue on a 6.8.0-rc1 kernel.
> I tested iperf over smc-d with:
> smc_run iperf3 -s
> smc_run iperf3 -c <IP@>
>
> 1) Doing an iperf in a single system using 127.0.0.1 as IP@
> (System A=iperf client=iperf server)
> 2) Doing iperf to a remote system (System A=client; System B=iperf server)
>
> The second iperf fails with an error message like:
> "iperf3: error - unable to receive cookie at server: Bad file descriptor" on the server"
>
> If I do first 2) (iperf to remote) and then 1) (iperf to local), then the
> iperf to local fails.
>
> I can do multiple iperf to the first server without problems.
>
> I ran it on a debug server with KASAN, but got no reports in the Logfile.
>
> I will try to debug further, but wanted to let you all know.
>
> Kind regards
> Alexandra
>
> Reported-by: Matthew Rosato <mjrosato@...ux.ibm.com>
>
Hi Alexandra and Matthew,
Thank you very much for detailed description.
I tried to reproduce this with loopback-ism, cut some checks so that the remote-system
handshake can be done. After a while debug I found an elementary mistake of mine in
b40584d ("net/smc: compatible with 128-bits extended GID of virtual ISM device")..
The operator order in smcd_lgr_match() is not as expected. It will always return
'true' in remote-system case.
static bool smcd_lgr_match(struct smc_link_group *lgr,
- struct smcd_dev *smcismdev, u64 peer_gid)
+ struct smcd_dev *smcismdev,
+ struct smcd_gid *peer_gid)
{
- return lgr->peer_gid == peer_gid && lgr->smcd == smcismdev;
+ return lgr->peer_gid.gid == peer_gid->gid && lgr->smcd == smcismdev &&
+ smc_ism_is_virtual(smcismdev) ?
+ (lgr->peer_gid.gid_ext == peer_gid->gid_ext) : 1;
}
Could you please try again with this patch? to see if this is the root cause.
Really sorry for the inconvenience.
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index da6a8d9c81ea..c6a6ba56c9e3 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -1896,8 +1896,8 @@ static bool smcd_lgr_match(struct smc_link_group *lgr,
struct smcd_gid *peer_gid)
{
return lgr->peer_gid.gid == peer_gid->gid && lgr->smcd == smcismdev &&
- smc_ism_is_virtual(smcismdev) ?
- (lgr->peer_gid.gid_ext == peer_gid->gid_ext) : 1;
+ (smc_ism_is_virtual(smcismdev) ?
+ (lgr->peer_gid.gid_ext == peer_gid->gid_ext) : 1);
}
Thanks,
Wen Gu
Powered by blists - more mailing lists