lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <06a87a92-6cce-4a63-99d0-463a1d035478@linux.alibaba.com>
Date: Wed, 24 Sep 2025 11:13:05 +0800
From: Guangguan Wang <guangguan.wang@...ux.alibaba.com>
To: Halil Pasic <pasic@...ux.ibm.com>, Dust Li <dust.li@...ux.alibaba.com>
Cc: Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 Simon Horman <horms@...nel.org>, "D. Wythe" <alibuda@...ux.alibaba.com>,
 Sidraya Jayagond <sidraya@...ux.ibm.com>, Wenjia Zhang
 <wenjia@...ux.ibm.com>, Mahanta Jambigi <mjambigi@...ux.ibm.com>,
 Tony Lu <tonylu@...ux.alibaba.com>, Wen Gu <guwen@...ux.alibaba.com>,
 netdev@...r.kernel.org, linux-doc@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-rdma@...r.kernel.org,
 linux-s390@...r.kernel.org
Subject: Re: [PATCH net-next v2 1/2] net/smc: make wr buffer count
 configurable



在 2025/9/19 22:55, Halil Pasic 写道:
> On Tue, 9 Sep 2025 12:18:50 +0200
> Halil Pasic <pasic@...ux.ibm.com> wrote:
> 
> 
> Can maybe Wen Gu and  Guangguan Wang chime in. From what I read
> link->wr_rx_buflen can be either SMC_WR_BUF_SIZE that is 48 in which
> case it does not matter, or SMC_WR_BUF_V2_SIZE that is 8192, if
> !smc_link_shared_v2_rxbuf(lnk) i.e. max_recv_sge == 1. So we talk
> about roughly a factor of 170 here. For a large pref_recv_wr the
> back of logic is still there to save us but I really would not say that
> this is how this is intended to work.
> 

Hi Halil,

I think the root cause of the problem this patchset try to solve is a mismatch
between SMC_WR_BUF_CNT and the max_conns per lgr(which value is 255). Furthermore,
I believe that value 255 of the max_conns per lgr is not an optimal value, as too
few connections lead to a waste of memory and too many connections lead to I/O queuing
within a single QP(every WR post_send to a single QP will initiate and complete in sequence).

We actually identified this problem long ago. In Alibaba Cloud Linux distribution, we have
changed SMC_WR_BUF_CNT to 64 and reduced max_conns per lgr to 32(for SMC-R V2.1). This
configuration has worked well under various workflow for a long time.

SMC-R V2.1 already support negotiation of the max_conns per lgr. Simply change the value of
the macro SMC_CONN_PER_LGR_PREFER can influence the negotiation result. But SMC-R V1.0 and SMC-R
v2.0 do not support the negotiation of the max_conns per lgr.
I think it is better to reduce SMC_CONN_PER_LGR_PREFER for SMC-R V2.1. But for SMC-R V1.0 and
SMC-R V2.0, I do not have any good idea.

> Maybe not supporting V2 on devices with max_recv_sge is a better choice,
> assuming that a maximal V2 LLC msg needs to fit each and every receive
> WR buffer. Which seems to be the case based on 27ef6a9981fe ("net/smc:
> support SMC-R V2 for rdma devices with max_recv_sge equals to 1").
>

For rdma dev whose max_recv_sge is 1, as metioned in the commit log in the related patch,
it is better to support than SMC_CLC_DECL_INTERR fallback, as SMC_CLC_DECL_INTERR fallback
is not a fast fallback, and may heavily influence the efficiency of the connecting process
in both the server and client side.

 
> For me the best course of action seems to be to send a V3 using
> link->wr_rx_buflen. I'm really not that knowledgeable about RDMA or
> the SMC-R protocol, but I'm happy to be part of the discussion on this
> matter.
> 
> Regards,
> Halil
And a tiny suggestion for the risk you mentioned in commit log
("Addressing this by simply bumping SMC_WR_BUF_CNT to 256 was deemed
risky, because the large-ish physically continuous allocation could fail
and lead to TCP fall-backs."). Non-physically continuous allocation (vmalloc/vzalloc .etc.) is
also supported for wr buffers. SMC-R snd_buf and rmb have already supported for non-physically
continuous memory, when sysctl_smcr_buf_type is set to SMCR_VIRT_CONT_BUFS or SMCR_MIXED_BUFS.
It can be an example of using non-physically continuous memory.

Regards,
Guangguan Wang


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ