[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f309d525-7e12-ee81-8d59-ad07f94f9e9d@linux.ibm.com>
Date: Tue, 30 May 2023 22:34:29 +0200
From: Wenjia Zhang <wenjia@...ux.ibm.com>
To: Wen Gu <guwen@...ux.alibaba.com>, kgraul@...ux.ibm.com,
jaka@...ux.ibm.com, davem@...emloft.net, edumazet@...gle.com,
kuba@...nel.org, pabeni@...hat.com
Cc: linux-s390@...r.kernel.org, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH net 2/2] net/smc: Don't use RMBs not mapped to new link in
SMCRv2 ADD LINK
On 27.05.23 17:20, Wen Gu wrote:
>
>
> On 2023/5/27 18:22, Wenjia Zhang wrote:
>>
>> I'm wondering if this crash is introduced by the first fix patch you
>> wrote.
>>
>> Thanks,
>> Wenjia
>
> Hi Wenjia,
>
> No, the crash can be reproduced without my two patches by the following
> steps:
>
> 1. Each side activates only one RNIC firstly and set the default
> sndbuf/RMB sizes to more
> than 16KB, such as 64KB, through sysctl net.smc.{wmem | rmem}.
> (The reason why initial sndbufs/RMBs size needs to be larger than
> 16KB will be explained later)
>
> 2. Use SMCRv2 in any test, just to create a link group that has some
> alloced RMBs.
>
> Example of step #1 #2:
>
> [server]
> smcr ueid add 1234
> sysctl net.smc.rmem=65536
> sysctl net.smc.wmem=65536
> smc_run sockperf sr --tcp
>
> [client]
> smcr ueid add 1234
> sysctl net.smc.rmem=65536
> sysctl net.smc.wmem=65536
> smc_run sockperf pp --tcp -i <server ip> -t <time>
>
>
> 3. Change the default sndbuf/RMB sizes, make sure they are larger than
> initial size above,
> such as 256KB.
>
> 4. Then rerun the test, and there will be some bigger RMBs alloced. And
> when the test is
> running, activate the second alternate RNIC of each side. It will
> trigger to add a new
> link and do what I described in the second patch's commit log, that
> only map the in-use
> 256KB RMBs to new link but try to access the unused 64KB RMBs'
> invalid mr[new_link->lnk_idx].
>
> Example of step #3 #4:
>
> [server]
> sysctl net.smc.rmem=262144
> sysctl net.smc.wmem=262144
> smc_run sockperf sr --tcp
>
> [client]
> sysctl net.smc.rmem=262144
> sysctl net.smc.wmem=262144
> smc_run sockperf pp --tcp -i <server ip> -t <time>
>
> When the sockperf is running:
>
> [server/client]
> ip link set dev <2nd RNIC> up # activate the second alternate
> RNIC, then crash occurs.
>
>
> At the beginning, I only found the crash in the second patch. But when I
> try to fix it,
> I found the issue descibed in the first patch.
>
> In first patch, if I understand correctly, smc_llc_get_first_rmb() is
> aimed to get the first
> RMB in lgr->rmb[*]. If so, It should start from lgr->rmbs[0] instead of
> lgr->rmbs[1], right?
>
> Then back to the reason needs to be explained in step #1. Because of the
> issue mentioned
> above in smc_llc_get_first_rmb(), if we set the initial sndbuf/RMB sizes
> to 16KB, these 16KB
> RMBs (in lgr->rmbs[0]) alloced in step #2 will happen not to be accessed
> in step #4, so the
> potential crash is hided.
>
> So, the crash is not introduced by the first fix. Instead, it is the
> first issue that may hide
> the second issue(crash) in special cases.
>
> I am a little curious why you think the first fix patch caused the
> second crash? Is
> something wrong in the first fix patch?
>
> Thanks for your review!
>
> Regards,
> Wen Gu
Hi Wen,
Sorry for the late answer because of the public holiday here!
I really like the test scenario, thank you for the elaboration and the
fixes!
They look good to me.
Why I asked that was that the first patch looked very reasonable, but I
was wondering why I didn't meet any problem with that before ;-) and if
it would trigger some problem during processing the SMCRv1 ADD Link
Continuation Messages. After checking the code again, I don't think
there would be any problem with the patch, because in the case of
processing the SMCRv1 ADD Link Continuation Messages, it's about the
same RMB.
Hi @Paolo, I would appreciate it if you could give us more time to
review and test the patches. Because we have to make sure that they can
work on our platform (s390) without problem, not only on x86.
Thanks
Wenjia
Powered by blists - more mailing lists