lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <a50bd6bd-5cae-49d9-8bd8-172f88035d18@nvidia.com> Date: Fri, 6 Sep 2024 15:17:50 +0300 From: Edward Srouji <edwards@...dia.com> To: Zhu Yanjun <yanjun.zhu@...ux.dev>, Leon Romanovsky <leon@...nel.org>, Jason Gunthorpe <jgg@...dia.com> Cc: Leon Romanovsky <leonro@...dia.com>, linux-kernel@...r.kernel.org, linux-rdma@...r.kernel.org, netdev@...r.kernel.org, Saeed Mahameed <saeedm@...dia.com>, Tariq Toukan <tariqt@...dia.com>, Yishai Hadas <yishaih@...dia.com> Subject: Re: [PATCH rdma-next 0/2] Introduce mlx5 data direct placement (DDP) On 9/6/2024 8:02 AM, Zhu Yanjun wrote: > External email: Use caution opening links or attachments > > > 在 2024/9/5 20:23, Edward Srouji 写道: >> >> On 9/4/2024 2:53 PM, Zhu Yanjun wrote: >>> External email: Use caution opening links or attachments >>> >>> >>> 在 2024/9/4 16:27, Edward Srouji 写道: >>>> >>>> On 9/4/2024 9:02 AM, Zhu Yanjun wrote: >>>>> External email: Use caution opening links or attachments >>>>> >>>>> >>>>> 在 2024/9/3 19:37, Leon Romanovsky 写道: >>>>>> From: Leon Romanovsky <leonro@...dia.com> >>>>>> >>>>>> Hi, >>>>>> >>>>>> This series from Edward introduces mlx5 data direct placement (DDP) >>>>>> feature. >>>>>> >>>>>> This feature allows WRs on the receiver side of the QP to be >>>>>> consumed >>>>>> out of order, permitting the sender side to transmit messages >>>>>> without >>>>>> guaranteeing arrival order on the receiver side. >>>>>> >>>>>> When enabled, the completion ordering of WRs remains in-order, >>>>>> regardless of the Receive WRs consumption order. >>>>>> >>>>>> RDMA Read and RDMA Atomic operations on the responder side >>>>>> continue to >>>>>> be executed in-order, while the ordering of data placement for RDMA >>>>>> Write and Send operations is not guaranteed. >>>>> >>>>> It is an interesting feature. If I got this feature correctly, this >>>>> feature permits the user consumes the data out of order when RDMA >>>>> Write >>>>> and Send operations. But its completiong ordering is still in order. >>>>> >>>> Correct. >>>>> Any scenario that this feature can be applied and what benefits >>>>> will be >>>>> got from this feature? >>>>> >>>>> I am just curious about this. Normally the users will consume the >>>>> data >>>>> in order. In what scenario, the user will consume the data out of >>>>> order? >>>>> >>>> One of the main benefits of this feature is achieving higher bandwidth >>>> (BW) by allowing >>>> responders to receive packets out of order (OOO). >>>> >>>> For example, this can be utilized in devices that support multi-plane >>>> functionality, >>>> as introduced in the "Multi-plane support for mlx5" series [1]. When >>>> mlx5 multi-plane >>>> is supported, a single logical mlx5 port aggregates multiple physical >>>> plane ports. >>>> In this scenario, the requester can "spray" packets across the >>>> multiple physical >>>> plane ports without guaranteeing packet order, either on the wire or >>>> on the receiver >>>> (responder) side. >>>> >>>> With this approach, no barriers or fences are required to ensure >>>> in-order packet >>>> reception, which optimizes the data path for performance. This can >>>> result in better >>>> BW, theoretically achieving line-rate performance equivalent to the >>>> sum of >>>> the maximum BW of all physical plane ports, with only one QP. >>> >>> Thanks a lot for your quick reply. Without ensuring in-order packet >>> reception, this does optimize the data path for performance. >>> >>> I agree with you. >>> >>> But how does the receiver get the correct packets from the out-of-order >>> packets efficiently? >>> >>> The method is implemented in Software or Hardware? >> >> >> The packets have new field that is used by the HW to understand the >> correct message order (similar to PSN). >> >> Once the packets arrive OOO to the receiver side, the data is scattered >> directly (hence the DDP - "Direct Data Placement" name) by the HW. >> >> So the efficiency is achieved by the HW, as it also saves the required >> context and metadata so it can deliver the correct completion to the >> user (in-order) once we have some WQEs that can be considered an >> "in-order window" and be delivered to the user. >> >> The SW/Applications may receive OOO WR_IDs though (because the first CQE >> may have consumed Recv WQE of any index on the receiver side), and it's >> their responsibility to handle it from this point, if it's required. > > Got it. It seems that all the functionalities are implemented in HW. The > SW only receives OOO WR_IDs. Thanks a lot. Perhaps it is helpful to RDMA > LAG devices. It should enhance the performance^_^ > > BTW, do you have any performance data with this feature? Not yet. We tested it functionality wise for now. But we should be able to measure its performance soon :). > > Best Regards, > Zhu Yanjun > >> >>> >>> I am just interested in this feature and want to know more about this. >>> >>> Thanks, >>> >>> Zhu Yanjun >>> >>>> >>>> [1] https://lore.kernel.org/lkml/cover.1718553901.git.leon@kernel.org/ >>>>> Thanks, >>>>> Zhu Yanjun >>>>> >>>>>> >>>>>> Thanks >>>>>> >>>>>> Edward Srouji (2): >>>>>> net/mlx5: Introduce data placement ordering bits >>>>>> RDMA/mlx5: Support OOO RX WQE consumption >>>>>> >>>>>> drivers/infiniband/hw/mlx5/main.c | 8 +++++ >>>>>> drivers/infiniband/hw/mlx5/mlx5_ib.h | 1 + >>>>>> drivers/infiniband/hw/mlx5/qp.c | 51 >>>>>> +++++++++++++++++++++++++--- >>>>>> include/linux/mlx5/mlx5_ifc.h | 24 +++++++++---- >>>>>> include/uapi/rdma/mlx5-abi.h | 5 +++ >>>>>> 5 files changed, 78 insertions(+), 11 deletions(-) >>>>>> >>>>> >>> -- >>> Best Regards, >>> Yanjun.Zhu >>> >
Powered by blists - more mailing lists