lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250924115010.38d2f3cb.pasic@linux.ibm.com>
Date: Wed, 24 Sep 2025 11:50:10 +0200
From: Halil Pasic <pasic@...ux.ibm.com>
To: Guangguan Wang <guangguan.wang@...ux.alibaba.com>
Cc: Dust Li <dust.li@...ux.alibaba.com>, Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
        "D.
 Wythe" <alibuda@...ux.alibaba.com>,
        Sidraya Jayagond
 <sidraya@...ux.ibm.com>,
        Wenjia Zhang <wenjia@...ux.ibm.com>,
        Mahanta
 Jambigi <mjambigi@...ux.ibm.com>,
        Tony Lu <tonylu@...ux.alibaba.com>, Wen
 Gu <guwen@...ux.alibaba.com>,
        netdev@...r.kernel.org, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-rdma@...r.kernel.org,
        linux-s390@...r.kernel.org, Halil Pasic
 <pasic@...ux.ibm.com>
Subject: Re: [PATCH net-next v2 1/2] net/smc: make wr buffer count
 configurable

On Wed, 24 Sep 2025 11:13:05 +0800
Guangguan Wang <guangguan.wang@...ux.alibaba.com> wrote:

> 在 2025/9/19 22:55, Halil Pasic 写道:
> > On Tue, 9 Sep 2025 12:18:50 +0200
> > Halil Pasic <pasic@...ux.ibm.com> wrote:
> > 
> > 
> > Can maybe Wen Gu and  Guangguan Wang chime in. From what I read
> > link->wr_rx_buflen can be either SMC_WR_BUF_SIZE that is 48 in which
> > case it does not matter, or SMC_WR_BUF_V2_SIZE that is 8192, if
> > !smc_link_shared_v2_rxbuf(lnk) i.e. max_recv_sge == 1. So we talk
> > about roughly a factor of 170 here. For a large pref_recv_wr the
> > back of logic is still there to save us but I really would not say that
> > this is how this is intended to work.
> >   
> 
> Hi Halil,
> 
> I think the root cause of the problem this patchset try to solve is a mismatch
> between SMC_WR_BUF_CNT and the max_conns per lgr(which value is 255). Furthermore,
> I believe that value 255 of the max_conns per lgr is not an optimal value, as too
> few connections lead to a waste of memory and too many connections lead to I/O queuing
> within a single QP(every WR post_send to a single QP will initiate and complete in sequence).
> 
> We actually identified this problem long ago. In Alibaba Cloud Linux distribution, we have
> changed SMC_WR_BUF_CNT to 64 and reduced max_conns per lgr to 32(for SMC-R V2.1). This
> configuration has worked well under various workflow for a long time.
> 
> SMC-R V2.1 already support negotiation of the max_conns per lgr. Simply change the value of
> the macro SMC_CONN_PER_LGR_PREFER can influence the negotiation result. But SMC-R V1.0 and SMC-R
> v2.0 do not support the negotiation of the max_conns per lgr.
> I think it is better to reduce SMC_CONN_PER_LGR_PREFER for SMC-R V2.1. But for SMC-R V1.0 and
> SMC-R V2.0, I do not have any good idea.
> 

I agree, the number of WR buffers and the max number of connections per
lgr can an should be tuned in concert.

> > Maybe not supporting V2 on devices with max_recv_sge is a better choice,
> > assuming that a maximal V2 LLC msg needs to fit each and every receive
> > WR buffer. Which seems to be the case based on 27ef6a9981fe ("net/smc:
> > support SMC-R V2 for rdma devices with max_recv_sge equals to 1").
> >  
> 
> For rdma dev whose max_recv_sge is 1, as metioned in the commit log in the related patch,
> it is better to support than SMC_CLC_DECL_INTERR fallback, as SMC_CLC_DECL_INTERR fallback
> is not a fast fallback, and may heavily influence the efficiency of the connecting process
> in both the server and client side.

I mean another possible mitigation of the problem can be the following,
if there is a device in the mix with max_recv_sge < 2 the don't propose/
accept SMCR-V2. 

Do you know how prevalent and relevant are max_recv_sge < 2 RDMA
devices, and how likely is it that somebody would like to use SMC-R with
such devices?

> 
>  
> > For me the best course of action seems to be to send a V3 using
> > link->wr_rx_buflen. I'm really not that knowledgeable about RDMA or
> > the SMC-R protocol, but I'm happy to be part of the discussion on this
> > matter.
> > 
> > Regards,
> > Halil  
>
> And a tiny suggestion for the risk you mentioned in commit log
> ("Addressing this by simply bumping SMC_WR_BUF_CNT to 256 was deemed
> risky, because the large-ish physically continuous allocation could fail
> and lead to TCP fall-backs."). Non-physically continuous allocation (vmalloc/vzalloc .etc.) is
> also supported for wr buffers. SMC-R snd_buf and rmb have already supported for non-physically
> continuous memory, when sysctl_smcr_buf_type is set to SMCR_VIRT_CONT_BUFS or SMCR_MIXED_BUFS.
> It can be an example of using non-physically continuous memory.
> 

I think we can put this on the list of possible enhancements. I would
perfer to not add this to the scope of this series. But I would be happy to
see this happen. Don't know know if somebody form Alibaba, or maybe
Mahanta or Sid would like to pick this up as an enhancement on top.

Thank you very much for for your comments!

Regards,
Halil 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ