[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7cab425e-9d50-a28d-bb4c-21209ba78285@fujitsu.com>
Date: Tue, 25 Jan 2022 10:32:30 +0000
From: "lizhijian@...itsu.com" <lizhijian@...itsu.com>
To: Zhu Yanjun <zyjzyj2000@...il.com>,
"lizhijian@...itsu.com" <lizhijian@...itsu.com>
CC: RDMA mailing list <linux-rdma@...r.kernel.org>,
Jason Gunthorpe <jgg@...pe.ca>,
"aharonl@...dia.com" <aharonl@...dia.com>,
Leon Romanovsky <leon@...nel.org>,
"tom@...pey.com" <tom@...pey.com>,
"tomasz.gromadzki@...el.com" <tomasz.gromadzki@...el.com>,
LKML <linux-kernel@...r.kernel.org>,
"mbloch@...dia.com" <mbloch@...dia.com>,
"liangwenpeng@...wei.com" <liangwenpeng@...wei.com>,
"yangx.jy@...itsu.com" <yangx.jy@...itsu.com>,
"y-goto@...itsu.com" <y-goto@...itsu.com>,
Bob Pearson <rpearsonhpe@...il.com>,
"dan.j.williams@...el.com" <dan.j.williams@...el.com>,
"yangx.jy@...itsu.com" <yangx.jy@...itsu.com>
Subject: Re: [RFC PATCH v2 0/9] RDMA/rxe: Add RDMA FLUSH operation
On 25/01/2022 16:57, Zhu Yanjun wrote:
> On Tue, Jan 25, 2022 at 4:45 PM Li Zhijian <lizhijian@...fujitsu.com> wrote:
>> Hey folks,
>>
>> I wanna thank all of you for the kind feedback in my previous RFC.
>> Recently, i have tried my best to do some updates as per your comments.
>> Indeed, not all comments have been addressed for some reasons, i still
>> wish to post this new one to start a new discussion.
>>
>> Outstanding issues:
>> - iova_to_addr() without any kmap/kmap_local_page flows might not always
>> work. # existing issue.
>> - responder should reply error to requested side when it requests a
>> persistence placement type to DRAM ?
>> -------
>>
>> These patches are going to implement a *NEW* RDMA opcode "RDMA FLUSH".
>> In IB SPEC 1.5[1][2], 2 new opcodes, ATOMIC WRITE and RDMA FLUSH were
>> added in the MEMORY PLACEMENT EXTENSIONS section.
>>
>> FLUSH is used by the requesting node to achieve guarantees on the data
>> placement within the memory subsystem of preceding accesses to a
>> single memory region, such as those performed by RDMA WRITE, Atomics
>> and ATOMIC WRITE requests.
>>
>> The operation indicates the virtual address space of a destination node
>> and where the guarantees should apply. This range must be contiguous
>> in the virtual space of the memory key but it is not necessarily a
>> contiguous range of physical memory.
>>
>> FLUSH packets carry FLUSH extended transport header (see below) to
>> specify the placement type and the selectivity level of the operation
>> and RDMA extended header (RETH, see base document RETH definition) to
>> specify the R_Key VA and Length associated with this request following
>> the BTH in RC, RDETH in RD and XRCETH in XRC.
> Thanks. Would you like to add some test cases in the latest rdma-core
> about this RDMA FLUSH operation?
Of course, they are on the way. Actually i had WIP PR to do that:
https://github.com/linux-rdma/rdma-core/pull/1119
But some stuffs cannot start until we have a more stable proposal and APIs.
Thanks
Zhijian
>
> Thanks a lot.
> Zhu Yanjun
>
>> RC FLUSH:
>> +----+------+------+
>> |BTH | FETH | RETH |
>> +----+------+------+
>>
>> RD FLUSH:
>> +----+------+------+------+
>> |BTH | RDETH| FETH | RETH |
>> +----+------+------+------+
>>
>> XRC FLUSH:
>> +----+-------+------+------+
>> |BTH | XRCETH| FETH | RETH |
>> +----+-------+------+------+
>>
>> Currently, we introduce RC and RD services only, since XRC has not been
>> implemented by rxe yet.
>> NOTE: only RC service is tested now, and since other HCAs have not
>> added/implemented FLUSH yet, we can only test FLUSH operation in both
>> SoftRoCE/rxe devices.
>>
>> The corresponding rdma-core and FLUSH example are available on:
>> https://github.com/zhijianli88/rdma-core/tree/rfc
>> Can access the kernel source in:
>> https://github.com/zhijianli88/linux/tree/rdma-flush
>>
>> - We introduce is_pmem attribute to MR(memory region)
>> - We introduce FLUSH placement type attributes to HCA
>> - We introduce FLUSH access flags that users are able to register with
>> Below figure shows the valid access flags uses can register with:
>> +------------------------+------------------+--------------+
>> | HCA attributes | register access flags |
>> | and +-----------------+---------------+
>> | MR attribute(is_pmem) |global visibility | persistence |
>> |------------------------+------------------+--------------+
>> | global visibility(DRAM)| O | X |
>> |------------------------+------------------+--------------+
>> | global visibility(PMEM)| O | X |
>> |------------------------+------------------+--------------+
>> | persistence(DRAM) | X | X |
>> |------------------------+------------------+--------------+
>> | persistence(PMEM) | X | O |
>> +------------------------+------------------+--------------+
>> O: allow to register such access flag
>>
>> In order to make placement guarentees, we currently reject requesting a
>> persistent flush to a non-pmem.
>> The responder will check the remote requested placement types by checking
>> the registered access flags.
>> +------------------------+------------------+--------------+
>> | | registered flags |
>> | remote requested types +------------------+--------------+
>> | |global visibility | persistence |
>> |------------------------+------------------+--------------+
>> | global visibility | O | x |
>> +------------------------+------------------+--------------+
>> | persistence | X | O |
>> +------------------------+------------------+--------------+
>> O: allow to request such placement type
>>
>> Below list some details about FLUSH transport packet:
>>
>> A FLUSH message is built upon FLUSH request packet and is responded
>> successfully by RDMA READ response of zero size.
>>
>> oA19-2: FLUSH shall be single packet message and shall have no payload.
>> oA19-5: FLUSH BTH shall hold the Opcode = 0x1C
>>
>> FLUSH Extended Transport Header(FETH)
>> +-----+-----------+------------------------+----------------------+
>> |Bits | 31-6 | 5-4 | 3-0 |
>> +-----+-----------+------------------------+----------------------+
>> | | Reserved | Selectivity Level(SEL) | Placement Type(PLT) |
>> +-----+-----------+------------------------+----------------------+
>>
>> Selectivity Level (SEL) – defines the memory region scope the FLUSH
>> should apply on. Values are as follows:
>> • b’00 - Memory Region Range: FLUSH applies for all preceding memory
>> updates to the RETH range on this QP. All RETH fields shall be
>> valid in this selectivity mode. RETH:DMALen field shall be
>> between zero and (2 31 -1) bytes (inclusive).
>> • b’01 - Memory Region: FLUSH applies for all preceding memory up-
>> dates to RETH.R_key on this QP. RETH:DMALen and RETH:VA
>> shall be ignored in this mode.
>> • b'10 - Reserved.
>> • b'11 - Reserved.
>>
>> Placement Type (PLT) – Defines the memory placement guarantee of
>> this FLUSH. Multiple bits may be set in this field. Values are as follows:
>> • Bit 0 if set to '1' indicated that the FLUSH should guarantee Global
>> Visibility.
>> • Bit 1 if set to '1' indicated that the FLUSH should guarantee
>> Persistence.
>> • Bits 3:2 are reserved
>>
>> [1]: https://www.infinibandta.org/ibta-specification/ # login required
>> [2]: https://www.infinibandta.org/wp-content/uploads/2021/08/IBTA-Overview-of-IBTA-Volume-1-Release-1.5-and-MPE-2021-08-17-Secure.pptx
>>
>> CC: yangx.jy@...fujitsu.com
>> CC: y-goto@...itsu.com
>> CC: Jason Gunthorpe <jgg@...pe.ca>
>> CC: Zhu Yanjun <zyjzyj2000@...il.com
>> CC: Leon Romanovsky <leon@...nel.org>
>> CC: Bob Pearson <rpearsonhpe@...il.com>
>> CC: Mark Bloch <mbloch@...dia.com>
>> CC: Wenpeng Liang <liangwenpeng@...wei.com>
>> CC: Aharon Landau <aharonl@...dia.com>
>> CC: Tom Talpey <tom@...pey.com>
>> CC: "Gromadzki, Tomasz" <tomasz.gromadzki@...el.com>
>> CC: Dan Williams <dan.j.williams@...el.com>
>> CC: linux-rdma@...r.kernel.org
>> CC: linux-kernel@...r.kernel.org
>>
>> V1:
>> https://lore.kernel.org/lkml/050c3183-2fc6-03a1-eecd-258744750972@fujitsu.com/T/
>> or https://github.com/zhijianli88/linux/tree/rdma-flush-rfcv1
>>
>> Changes log
>> V2:
>> https://github.com/zhijianli88/linux/tree/rdma-flush
>> RDMA: mr: Introduce is_pmem
>> check 1st byte to avoid crossing page boundary
>> new scheme to check is_pmem # Dan
>>
>> RDMA: Allow registering MR with flush access flags
>> combine with [03/10] RDMA/rxe: Allow registering FLUSH flags for supported device only to this patch # Jason
>> split RDMA_FLUSH to 2 capabilities
>>
>> RDMA/rxe: Allow registering persistent flag for pmem MR only
>> update commit message, get rid of confusing ib_check_flush_access_flags() # Tom
>>
>> RDMA/rxe: Implement RC RDMA FLUSH service in requester side
>> extend flush to include length field. # Tom and Tomasz
>>
>> RDMA/rxe: Implement flush execution in responder side
>> adjust start for WHOLE MR level # Tom
>> don't support DMA mr for flush # Tom
>> check flush return value
>>
>> RDMA/rxe: Enable RDMA FLUSH capability for rxe device
>> adjust patch's order. move it here from [04/10]
>>
>> Li Zhijian (9):
>> RDMA: mr: Introduce is_pmem
>> RDMA: Allow registering MR with flush access flags
>> RDMA/rxe: Allow registering persistent flag for pmem MR only
>> RDMA/rxe: Implement RC RDMA FLUSH service in requester side
>> RDMA/rxe: Set BTH's SE to zero for FLUSH packet
>> RDMA/rxe: Implement flush execution in responder side
>> RDMA/rxe: Implement flush completion
>> RDMA/rxe: Enable RDMA FLUSH capability for rxe device
>> RDMA/rxe: Add RD FLUSH service support
>>
>> drivers/infiniband/core/uverbs_cmd.c | 17 +++
>> drivers/infiniband/sw/rxe/rxe_comp.c | 4 +-
>> drivers/infiniband/sw/rxe/rxe_hdr.h | 52 +++++++++
>> drivers/infiniband/sw/rxe/rxe_loc.h | 2 +
>> drivers/infiniband/sw/rxe/rxe_mr.c | 37 ++++++-
>> drivers/infiniband/sw/rxe/rxe_opcode.c | 35 +++++++
>> drivers/infiniband/sw/rxe/rxe_opcode.h | 3 +
>> drivers/infiniband/sw/rxe/rxe_param.h | 4 +-
>> drivers/infiniband/sw/rxe/rxe_req.c | 19 +++-
>> drivers/infiniband/sw/rxe/rxe_resp.c | 133 +++++++++++++++++++++++-
>> include/rdma/ib_pack.h | 3 +
>> include/rdma/ib_verbs.h | 30 +++++-
>> include/uapi/rdma/ib_user_ioctl_verbs.h | 2 +
>> include/uapi/rdma/ib_user_verbs.h | 19 ++++
>> include/uapi/rdma/rdma_user_rxe.h | 7 ++
>> 15 files changed, 355 insertions(+), 12 deletions(-)
>>
>> --
>> 2.31.1
>>
>>
>>
Powered by blists - more mailing lists