lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZGW1shjKZFCn28LW@TONYMAC-ALIBABA.local>
Date: Thu, 18 May 2023 13:20:50 +0800
From: Tony Lu <tonylu@...ux.alibaba.com>
To: Wen Gu <guwen@...ux.alibaba.com>
Cc: kgraul@...ux.ibm.com, wenjia@...ux.ibm.com, jaka@...ux.ibm.com,
	davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
	pabeni@...hat.com, linux-s390@...r.kernel.org,
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
	liuyacan@...p.netease.com
Subject: Re: [PATCH net] net/smc: Reset connection when trying to use SMCRv2
 fails.

On Thu, May 18, 2023 at 01:14:55PM +0800, Wen Gu wrote:
> We found a crash when using SMCRv2 with 2 Mellanox ConnectX-4. It
> can be reproduced by:
> 
> - smc_run nginx
> - smc_run wrk -t 32 -c 500 -d 30 http://<ip>:<port>
> 
>  BUG: kernel NULL pointer dereference, address: 0000000000000014
>  #PF: supervisor read access in kernel mode
>  #PF: error_code(0x0000) - not-present page
>  PGD 8000000108713067 P4D 8000000108713067 PUD 151127067 PMD 0
>  Oops: 0000 [#1] PREEMPT SMP PTI
>  CPU: 4 PID: 2441 Comm: kworker/4:249 Kdump: loaded Tainted: G        W   E      6.4.0-rc1+ #42
>  Workqueue: smc_hs_wq smc_listen_work [smc]
>  RIP: 0010:smc_clc_send_confirm_accept+0x284/0x580 [smc]
>  RSP: 0018:ffffb8294b2d7c78 EFLAGS: 00010a06
>  RAX: ffff8f1873238880 RBX: ffffb8294b2d7dc8 RCX: 0000000000000000
>  RDX: 00000000000000b4 RSI: 0000000000000001 RDI: 0000000000b40c00
>  RBP: ffffb8294b2d7db8 R08: ffff8f1815c5860c R09: 0000000000000000
>  R10: 0000000000000400 R11: 0000000000000000 R12: ffff8f1846f56180
>  R13: ffff8f1815c5860c R14: 0000000000000001 R15: 0000000000000001
>  FS:  0000000000000000(0000) GS:ffff8f1aefd00000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 0000000000000014 CR3: 00000001027a0001 CR4: 00000000003706e0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  Call Trace:
>   <TASK>
>   ? mlx5_ib_map_mr_sg+0xa1/0xd0 [mlx5_ib]
>   ? smcr_buf_map_link+0x24b/0x290 [smc]
>   ? __smc_buf_create+0x4ee/0x9b0 [smc]
>   smc_clc_send_accept+0x4c/0xb0 [smc]
>   smc_listen_work+0x346/0x650 [smc]
>   ? __schedule+0x279/0x820
>   process_one_work+0x1e5/0x3f0
>   worker_thread+0x4d/0x2f0
>   ? __pfx_worker_thread+0x10/0x10
>   kthread+0xe5/0x120
>   ? __pfx_kthread+0x10/0x10
>   ret_from_fork+0x2c/0x50
>   </TASK>
> 
> During the CLC handshake, server sequentially tries available SMCRv2
> and SMCRv1 devices in smc_listen_work().
> 
> If an SMCRv2 device is found. SMCv2 based link group and link will be
> assigned to the connection. Then assumed that some buffer assignment
> errors happen later in the CLC handshake, such as RMB registration
> failure, server will give up SMCRv2 and try SMCRv1 device instead. But
> the resources assigned to the connection won't be reset.
> 
> When server tries SMCRv1 device, the connection creation process will
> be executed again. Since conn->lnk has been assigned when trying SMCRv2,
> it will not be set to the correct SMCRv1 link in
> smcr_lgr_conn_assign_link(). So in such situation, conn->lgr points to
> correct SMCRv1 link group but conn->lnk points to the SMCRv2 link
> mistakenly.
> 
> Then in smc_clc_send_confirm_accept(), conn->rmb_desc->mr[link->link_idx]
> will be accessed. Since the link->link_idx is not correct, the related
> MR may not have been initialized, so crash happens.
> 
>  | Try SMCRv2 device first
>  |     |-> conn->lgr:	assign existed SMCRv2 link group;
>  |     |-> conn->link:	assign existed SMCRv2 link (link_idx may be 1 in SMC_LGR_SYMMETRIC);
>  |     |-> sndbuf & RMB creation fails, quit;
>  |
>  | Try SMCRv1 device then
>  |     |-> conn->lgr:	create SMCRv1 link group and assign;
>  |     |-> conn->link:	keep SMCRv2 link mistakenly;
>  |     |-> sndbuf & RMB creation succeed, only RMB->mr[link_idx = 0]
>  |         initialized.
>  |
>  | Then smc_clc_send_confirm_accept() accesses
>  | conn->rmb_desc->mr[conn->link->link_idx, which is 1], then crash.
>  v
> 
> This patch tries to fix this by cleaning conn->lnk before assigning
> link. In addition, it is better to reset the connection and clean the
> resources assigned if trying SMCRv2 failed in buffer creation or
> registration.
> 
> Fixes: e49300a6bf62 ("net/smc: add listen processing for SMC-Rv2")
> Link: https://lore.kernel.org/r/20220523055056.2078994-1-liuyacan@corp.netease.com/
> Signed-off-by: Wen Gu <guwen@...ux.alibaba.com>

LGTM, thanks for your detailed analysis.

Reviewed-by: Tony Lu <tonylu@...ux.alibaba.com>

> ---
>  net/smc/af_smc.c   | 9 +++++++--
>  net/smc/smc_core.c | 1 +
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
> index 50c38b6..538e9c6 100644
> --- a/net/smc/af_smc.c
> +++ b/net/smc/af_smc.c
> @@ -2000,8 +2000,10 @@ static int smc_listen_rdma_init(struct smc_sock *new_smc,
>  		return rc;
>  
>  	/* create send buffer and rmb */
> -	if (smc_buf_create(new_smc, false))
> +	if (smc_buf_create(new_smc, false)) {
> +		smc_conn_abort(new_smc, ini->first_contact_local);
>  		return SMC_CLC_DECL_MEM;
> +	}
>  
>  	return 0;
>  }
> @@ -2217,8 +2219,11 @@ static void smc_find_rdma_v2_device_serv(struct smc_sock *new_smc,
>  	smcr_version = ini->smcr_version;
>  	ini->smcr_version = SMC_V2;
>  	rc = smc_listen_rdma_init(new_smc, ini);
> -	if (!rc)
> +	if (!rc) {
>  		rc = smc_listen_rdma_reg(new_smc, ini->first_contact_local);
> +		if (rc)
> +			smc_conn_abort(new_smc, ini->first_contact_local);
> +	}
>  	if (!rc)
>  		return;
>  	ini->smcr_version = smcr_version;
> diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
> index 4543567..3f465fa 100644
> --- a/net/smc/smc_core.c
> +++ b/net/smc/smc_core.c
> @@ -127,6 +127,7 @@ static int smcr_lgr_conn_assign_link(struct smc_connection *conn, bool first)
>  	int i, j;
>  
>  	/* do link balancing */
> +	conn->lnk = NULL;	/* reset conn->lnk first */
>  	for (i = 0; i < SMC_LINKS_PER_LGR_MAX; i++) {
>  		struct smc_link *lnk = &conn->lgr->lnk[i];
>  
> -- 
> 1.8.3.1

Powered by blists - more mailing lists