lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aLr4-V8V1ZWGMrOj@linux.alibaba.com>
Date: Fri, 5 Sep 2025 22:51:37 +0800
From: Dust Li <dust.li@...ux.alibaba.com>
To: Halil Pasic <pasic@...ux.ibm.com>
Cc: Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Simon Horman <horms@...nel.org>,
	"D. Wythe" <alibuda@...ux.alibaba.com>,
	Sidraya Jayagond <sidraya@...ux.ibm.com>,
	Wenjia Zhang <wenjia@...ux.ibm.com>,
	Mahanta Jambigi <mjambigi@...ux.ibm.com>,
	Tony Lu <tonylu@...ux.alibaba.com>,
	Wen Gu <guwen@...ux.alibaba.com>, netdev@...r.kernel.org,
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-rdma@...r.kernel.org, linux-s390@...r.kernel.org
Subject: Re: [PATCH net-next 1/2] net/smc: make wr buffer count configurable

On 2025-09-05 22:22:48, Dust Li wrote:
>On 2025-09-05 14:01:35, Halil Pasic wrote:
>>On Fri, 5 Sep 2025 11:00:59 +0200
>>Halil Pasic <pasic@...ux.ibm.com> wrote:
>>
>>> > 1. What if the two sides have different max_send_wr/max_recv_wr configurations?
>>> > IIUC, For example, if the client sets max_send_wr to 64, but the server sets
>>> > max_recv_wr to 16, the client might overflow the server's QP receive
>>> > queue, potentially causing an RNR (Receiver Not Ready) error.  
>>>
>>> I don't think the 16 is spec-ed anywhere and if the client and the server
>>> need to agree on the same value it should either be speced, or a
>>> protocol mechanism for negotiating it needs to exist. So what is your
>>> take on this as an SMC maintainer?
>
>Right — I didn't realize that either until I saw this patch today :)
>But since the implementation's been set to 16 since day one, bumping it
>up might break things.
>
>>>
>>> I think, we have tested heterogeneous setups and didn't see any grave
>>> issues. But let me please do a follow up on this. Maybe the other
>>> maintainers can chime in as well.
>
>I'm glad to hear from others.
>
>>
>>Did some research and some thinking. Are you concerned about a
>>performance regression for e.g. 64 -> 16 compared to 16 -> 16? According
>>to my current understanding the RNR must not lead to a catastrophic
>>failure, but the RDMA/IB stack is supposed to handle that.
>
>No, it's not just a performance regression.
>If we get an RNR when going from 64 -> 16, the whole link group gets
>torn down — and all SMC connections inside it break.
>So from the user’s point of view, connections will just randomly drop
>out of nowhere.

I double-checked the code and noticed we set qp_attr.rnr_retry =
SMC_QP_RNR_RETRY = 7, which means "infinite retries."
So the QP will just keep retrying — we won't actually get an RNR.
That said, yeah, just performance regression.

So in this case, I would regard it as acceptable. We can go with this.

Best regards,
Dust


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ