lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <72c54534-f4dc-4b81-bb0e-239a1ce6e1d4@talpey.com>
Date: Wed, 8 Jan 2025 11:38:52 -0500
From: Tom Talpey <tom@...pey.com>
To: He X <xw897002528@...il.com>
Cc: linux-cifs@...r.kernel.org, linux-kernel@...r.kernel.org,
 Namjae Jeon <linkinjeon@...nel.org>
Subject: Re: [PATCH 1/2] ksmbd: fix possibly wrong init value for RDMA buffer
 size

On 1/8/2025 10:03 AM, He X wrote:
>  > Ok, that's important and perhaps this needs more digging. What was
> your setup? Was it an iWARP connection, for example?
> 
> Direct connection between two mlx5_ib, ROCE network.
> 
>  > If IRD/ORD is the problem, you'll see connections break when write-heavy
> workloads are present. Is that what you mean by "did not work"?
> 
> Yes. Only disconnect when copying large files from clients(cifs) to 
> ksmbd. I do see some retrying in logs, but it is not able to recover.
> 
> I have cleared my testing logs, so I can not paste it here.

Ok. The interesting item would be the work request completion status
that preceded the connection failure, or the async error upcall event
from the rdma driver if that triggered first. Both client and server
logs are needed. And it can be a higher-level issue too, there were
some signing issues related to the fscache changes, these might be
in kernel 6.12. I tested mostly successfully at SDC in September with
them, anyway.

There may well be something else going on - RoCE can be very tricky
to set up since it depends on link layer flow control. You're not
using RoCEv2?

BTW the code does have some strange-looking defaults between client
and server IRD/ORD queue depths. The server defaults to 8 ORD, while
the client defaults to 32 IRD. This is odd, but not in itself fatal.
After all, other implementations (e.g. Windows) have their own defaults
too. The negotiation at both RDMA and SMB Direct should align them.

>  > Again "many"?
> 
> I mean the quote `In practice, many RDMA providers set the rd_atom and 
> rd_init_atom to the same value`.
> 
>> Other protocols may make different choices. Not this one.
> 
> Got. I'll do some more tests to see if I can find out the problem. 
> Thanks for your patience!

Great, looking forward to that.

Tom.

> 
> Tom Talpey <tom@...pey.com <mailto:tom@...pey.com>> 于2025年1月8日周三 
> 21:58写道:
> 
>     On 1/7/2025 10:19 PM, He X wrote:
>      > Thanks for your review!
>      >
>      > By man page, I mean rdma_xxx man pages like https://
>     linux.die.net/man/3/ <https://linux.die.net/man/3/>
>      > rdma_connect <https://linux.die.net/man/3/rdma_connect <https://
>     linux.die.net/man/3/rdma_connect>>. I do mean ORD
>      > or IRD, just bad wording.
> 
>     Ok, that's the user verb API, we're in the kernel here. Some things are
>     similar, but not all.
> 
>      > In short, RDMA on my setup did not work. While I am digging
>     around, I
> 
>     Ok, that's important and perhaps this needs more digging. What was
>     your setup? Was it an iWARP connection, for example? The iWARP protocol
>     is stricter than IB for IRD, because it does not support "retry" when
>     there are insufficient resources. This is a Good Thing, by the way,
>     it avoids silly tail latencies. But it can cause sloppy upper layer
>     code to break.
> 
>     If IRD/ORD is the problem, you'll see connections break when write-heavy
>     workloads are present. Is that what you mean by "did not work"?
> 
>      > noticed that `initiator_depth` is generally set to `min(xxx,
>      > max_qp_init_rd_atom)` in the kernel source code. I am not aware
>     of that
>      > ksmbd direct did not use IRD. And many clients set them to the
>     same value.
> 
>     Again "many"? Please be specific. Clients implement protocols, and
>     protocols have differing requirements. An SMB3 client should advertise
>     an ORD == 0, and should offer at least a small IRD > 0.
> 
>     An SMB3 server will do the converse - an IRD == 0 at all times, and an
>     ORD > 0 in response to the client's offered IRD. The resulting limits
>     are exchanged in the SMB Direct negotiation packets. The IRD==0 is what
>     you see in the very next line after your change:
> 
>       >> conn_param.responder_resources = 0;
> 
>     Other protocols may make different choices. Not this one.
> 
>     Tom.
> 
> 
>      >
>      > FYI, here is the original discussion on github https://
>     github.com/ <https://github.com/>
>      > namjaejeon/ksmbd/issues/497 <https://github.com/namjaejeon/ksmbd/
>     <https://github.com/namjaejeon/ksmbd/>
>      > issues/497>.
>      >
>      > Tom Talpey <tom@...pey.com <mailto:tom@...pey.com>
>     <mailto:tom@...pey.com <mailto:tom@...pey.com>>> 于2025年1月8日周三
>      > 05:04写道:
>      >
>      >     On 1/5/2025 10:39 PM, He Wang wrote:
>      >      > Field `initiator_depth` is for incoming request.
>      >      >
>      >      > According to the man page, `max_qp_rd_atom` is the maximum
>     number of
>      >      > outstanding packaets, and `max_qp_init_rd_atom` is the maximum
>      >     depth of
>      >      > incoming requests.
>      >
>      >     I do not believe this is correct, what "man page" are you
>     referring to?
>      >     The commit message is definitely wrong. Neither value is
>     referring to
>      >     generic "maximum packets" nor "incoming requests".
>      >
>      >     The max_qp_rd_atom is the "ORD" or outgoing read/atomic
>     request depth.
>      >     The ksmbd server uses this to control RDMA Read requests to
>     fetch data
>      >     from the client for certain SMB3_WRITE operations. (SMB
>     Direct does not
>      >     use atomics)
>      >
>      >     The max_qp_init_rd_atom is the "IRD" or incoming read/atomic
>     request
>      >     depth. The SMB3 protocol does not allow clients to request
>     data from
>      >     servers via RDMA Read. This is absolutely by design, and the
>     server
>      >     therefore does not use this value.
>      >
>      >     In practice, many RDMA providers set the rd_atom and
>     rd_init_atom to
>      >     the same value, but this change would appear to break SMB
>     Direct write
>      >     functionality when operating over providers that do not.
>      >
>      >     So, NAK.
>      >
>      >     Namjae, you should revert your upstream commit.
>      >
>      >     Tom.
>      >
>      >      >
>      >      > Signed-off-by: He Wang <xw897002528@...il.com
>     <mailto:xw897002528@...il.com>
>      >     <mailto:xw897002528@...il.com <mailto:xw897002528@...il.com>>>
>      >      > ---
>      >      >   fs/smb/server/transport_rdma.c | 2 +-
>      >      >   1 file changed, 1 insertion(+), 1 deletion(-)
>      >      >
>      >      > diff --git a/fs/smb/server/transport_rdma.c b/fs/smb/server/
>      >     transport_rdma.c
>      >      > index 0ef3c9f0b..c6dbbbb32 100644
>      >      > --- a/fs/smb/server/transport_rdma.c
>      >      > +++ b/fs/smb/server/transport_rdma.c
>      >      > @@ -1640,7 +1640,7 @@ static int
>     smb_direct_accept_client(struct
>      >     smb_direct_transport *t)
>      >      >       int ret;
>      >      >
>      >      >       memset(&conn_param, 0, sizeof(conn_param));
>      >      > -     conn_param.initiator_depth = min_t(u8, t->cm_id->device-
>      >      >attrs.max_qp_rd_atom,
>      >      > +     conn_param.initiator_depth = min_t(u8, t->cm_id->device-
>      >      >attrs.max_qp_init_rd_atom,
>      >      >
>      >     SMB_DIRECT_CM_INITIATOR_DEPTH);
>      >      >       conn_param.responder_resources = 0;
>      >      >
>      >
>      >
>      >
>      > --
>      > Best regards,
>      > xhe
> 
> 
> 
> -- 
> Best regards,
> xhe


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ