lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 27 May 2023 23:20:22 +0800
From: Wen Gu <guwen@...ux.alibaba.com>
To: Wenjia Zhang <wenjia@...ux.ibm.com>, kgraul@...ux.ibm.com,
 jaka@...ux.ibm.com, davem@...emloft.net, edumazet@...gle.com,
 kuba@...nel.org, pabeni@...hat.com
Cc: linux-s390@...r.kernel.org, netdev@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH net 2/2] net/smc: Don't use RMBs not mapped to new link in
 SMCRv2 ADD LINK



On 2023/5/27 18:22, Wenjia Zhang wrote:
> 
> I'm wondering if this crash is introduced by the first fix patch you wrote.
> 
> Thanks,
> Wenjia

Hi Wenjia,

No, the crash can be reproduced without my two patches by the following steps:

1. Each side activates only one RNIC firstly and set the default sndbuf/RMB sizes to more
    than 16KB, such as 64KB, through sysctl net.smc.{wmem | rmem}.
    (The reason why initial sndbufs/RMBs size needs to be larger than 16KB will be explained later)

2. Use SMCRv2 in any test, just to create a link group that has some alloced RMBs.

    Example of step #1 #2:

    [server]
    smcr ueid add 1234
    sysctl net.smc.rmem=65536
    sysctl net.smc.wmem=65536
    smc_run sockperf sr --tcp

    [client]
    smcr ueid add 1234
    sysctl net.smc.rmem=65536
    sysctl net.smc.wmem=65536
    smc_run sockperf pp --tcp -i <server ip> -t <time>


3. Change the default sndbuf/RMB sizes, make sure they are larger than initial size above,
    such as 256KB.

4. Then rerun the test, and there will be some bigger RMBs alloced. And when the test is
    running, activate the second alternate RNIC of each side. It will trigger to add a new
    link and do what I described in the second patch's commit log, that only map the in-use
    256KB RMBs to new link but try to access the unused 64KB RMBs' invalid mr[new_link->lnk_idx].

    Example of step #3 #4:

    [server]
    sysctl net.smc.rmem=262144
    sysctl net.smc.wmem=262144
    smc_run sockperf sr --tcp

    [client]
    sysctl net.smc.rmem=262144
    sysctl net.smc.wmem=262144
    smc_run sockperf pp --tcp -i <server ip> -t <time>

    When the sockperf is running:

    [server/client]
    ip link set dev <2nd RNIC> up	# activate the second alternate RNIC, then crash occurs.


At the beginning, I only found the crash in the second patch. But when I try to fix it,
I found the issue descibed in the first patch.

In first patch, if I understand correctly, smc_llc_get_first_rmb() is aimed to get the first
RMB in lgr->rmb[*]. If so, It should start from lgr->rmbs[0] instead of lgr->rmbs[1], right?

Then back to the reason needs to be explained in step #1. Because of the issue mentioned
above in smc_llc_get_first_rmb(), if we set the initial sndbuf/RMB sizes to 16KB, these 16KB
RMBs (in lgr->rmbs[0]) alloced in step #2 will happen not to be accessed in step #4, so the
potential crash is hided.

So, the crash is not introduced by the first fix. Instead, it is the first issue that may hide
the second issue(crash) in special cases.

I am a little curious why you think the first fix patch caused the second crash? Is
something wrong in the first fix patch?

Thanks for your review!

Regards,
Wen Gu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ