lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ac2d7fcc-024f-4913-949f-11cbe5d09f63@linux.dev>
Date: Fri, 18 Oct 2024 09:06:42 +0200
From: Zhu Yanjun <yanjun.zhu@...ux.dev>
To: Daisuke Matsuda <matsuda-daisuke@...itsu.com>,
 linux-rdma@...r.kernel.org, leon@...nel.org, jgg@...pe.ca,
 zyjzyj2000@...il.com
Cc: linux-kernel@...r.kernel.org, rpearsonhpe@...il.com, lizhijian@...itsu.com
Subject: Re: [PATCH for-next v8 0/6] On-Demand Paging on SoftRoCE

在 2024/10/9 3:58, Daisuke Matsuda 写道:
> This patch series implements the On-Demand Paging feature on SoftRoCE(rxe)
> driver, which has been available only in mlx5 driver[1] so far.
> 
> This series has been blocked because of the hang issue of srp 002 test[2],
> which was believed to be caused after applying the commit 9b4b7c1f9f54
> ("RDMA/rxe: Add workqueue support for rxe tasks"). My patches are dependent
> on the commit because the ODP feature requires sleeping in kernel space,
> and it is impossible with the former tasklet implementation.
> 
> According to the original reporter[3], the hang issue is already gone in
> v6.10. Additionally, tasklet is marked deprecated[4]. I think the rxe
> driver is ready to accept this series since there is no longer any reason
> to consider reverting back to the old tasklet.
> 
> I omitted some contents like the motive behind this series from the cover-
> letter. Please see the cover letter of v3 for more details[5].
> 
> [Overview]
> When applications register a memory region(MR), RDMA drivers normally pin
> pages in the MR so that physical addresses are never changed during RDMA
> communication. This requires the MR to fit in physical memory and
> inevitably leads to memory pressure. On the other hand, On-Demand Paging
> (ODP) allows applications to register MRs without pinning pages. They are
> paged-in when the driver requires and paged-out when the OS reclaims. As a
> result, it is possible to register a large MR that does not fit in physical
> memory without taking up so much physical memory.
> 
> [How does ODP work?]
> "struct ib_umem_odp" is used to manage pages. It is created for each
> ODP-enabled MR on its registration. This struct holds a pair of arrays
> (dma_list/pfn_list) that serve as a driver page table. DMA addresses and
> PFNs are stored in the driver page table. They are updated on page-in and
> page-out, both of which use the common interfaces in the ib_uverbs layer.
> 
> Page-in can occur when requester, responder or completer access an MR in
> order to process RDMA operations. If they find that the pages being
> accessed are not present on physical memory or requisite permissions are
> not set on the pages, they provoke page fault to make the pages present
> with proper permissions and at the same time update the driver page table.
> After confirming the presence of the pages, they execute memory access such
> as read, write or atomic operations.
> 
> Page-out is triggered by page reclaim or filesystem events (e.g. metadata
> update of a file that is being used as an MR). When creating an ODP-enabled
> MR, the driver registers an MMU notifier callback. When the kernel issues a
> page invalidation notification, the callback is provoked to unmap DMA
> addresses and update the driver page table. After that, the kernel releases
> the pages.
> 
> [Supported operations]
> All traditional operations are supported on RC connection. The new Atomic
> write[6] and RDMA Flush[7] operations are not included in this patchset. I
> will post them later after this patchset is merged. On UD connection, Send,
> Recv, and SRQ-Recv are supported.
> 
> [How to test ODP?]
> There are only a few resources available for testing. pyverbs testcases in
> rdma-core and perftest[8] are recommendable ones. Other than them, the
> ibv_rc_pingpong command can also be used for testing. Note that you may
> have to build perftest from upstream because old versions do not handle ODP
> capabilities correctly.

Thanks a lot. I have tested these patches with perftest. Because ODP (On 
Demand Paging) is a feature, can you also add some testcases into rdma 
core? So we can use rdma-core to make tests with this feature of rxe.

That is, add some testcases in run_tests.py, so use run_tests.py to 
verify this (ODP) feature on rxe.

Thanks,
Zhu Yanjun

> 
> The latest ODP tree is available from github:
> https://github.com/ddmatsu/linux/tree/odp_v8
> 
> [Future work]
> My next work is to enable the new Atomic write[6] and RDMA Flush[7]
> operations with ODP. After that, I am going to implement the prefetch
> feature. It allows applications to trigger page fault using
> ibv_advise_mr(3) to optimize performance. Some existing software like
> librpma[9] use this feature. Additionally, I think we can also add the
> implicit ODP feature in the future.
> 
> [1] Understanding On Demand Paging (ODP)
> https://enterprise-support.nvidia.com/s/article/understanding-on-demand-paging--odp-x
> 
> [2] [bug report] blktests srp/002 hang
> https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/
> 
> [3] blktests failures with v6.10-rc1 kernel
> https://lore.kernel.org/linux-block/wnucs5oboi4flje5yvtea7puvn6zzztcnlrfz3lpzlwgblrxgw@7wvqdzioejgl/
> 
> [4] [00/15] ethernet: Convert from tasklet to BH workqueue
> https://patchwork.kernel.org/project/linux-rdma/cover/20240621050525.3720069-1-allen.lkml@gmail.com/
> 
> [5] [PATCH for-next v3 0/7] On-Demand Paging on SoftRoCE
> https://lore.kernel.org/lkml/cover.1671772917.git.matsuda-daisuke@fujitsu.com/
> 
> [6] [PATCH v7 0/8] RDMA/rxe: Add atomic write operation
> https://lore.kernel.org/linux-rdma/1669905432-14-1-git-send-email-yangx.jy@fujitsu.com/
> 
> [7] [for-next PATCH 00/10] RDMA/rxe: Add RDMA FLUSH operation
> https://lore.kernel.org/lkml/20221206130201.30986-1-lizhijian@fujitsu.com/
> 
> [8] linux-rdma/perftest: Infiniband Verbs Performance Tests
> https://github.com/linux-rdma/perftest
> 
> [9] librpma: Remote Persistent Memory Access Library
> https://github.com/pmem/rpma
> 
> v7->v8:
>   1) Dropped the first patch because the same change was made by Bob Pearson.
>   cf. https://github.com/torvalds/linux/commit/23bc06af547f2ca3b7d345e09fd8d04575406274
>   2) Rebased to 6.12.1-rc2
> 
> v6->v7:
>   1) Rebased to 6.6.0
>   2) Disabled using hugepages with ODP
>   3) Addressed comments on v6 from Jason and Zhu
>     cf. https://lore.kernel.org/lkml/cover.1694153251.git.matsuda-daisuke@fujitsu.com/
> 
> v5->v6:
>   Fixed the implementation according to Jason's suggestions
>     cf. https://lore.kernel.org/all/ZIdFXfDu4IMKE+BQ@nvidia.com/
>     cf. https://lore.kernel.org/all/ZIdGU709e1h5h4JJ@nvidia.com/
> 
> v4->v5:
>   1) Rebased to 6.4.0-rc2+
>   2) Changed to schedule all works on responder and completer to workqueue
> 
> v3->v4:
>   1) Re-designed functions that access MRs to use the MR xarray.
>   2) Rebased onto the latest jgg-for-next tree.
> 
> v2->v3:
>   1) Removed a patch that changes the common ib_uverbs layer.
>   2) Re-implemented patches for conversion to workqueue.
>   3) Fixed compile errors (happened when CONFIG_INFINIBAND_ON_DEMAND_PAGING=n).
>   4) Fixed some functions that returned incorrect errors.
>   5) Temporarily disabled ODP for RDMA Flush and Atomic Write.
> 
> v1->v2:
>   1) Fixed a crash issue reported by Haris Iqbal.
>   2) Tried to make lock patters clearer as pointed out by Romanovsky.
>   3) Minor clean ups and fixes.
> 
> Daisuke Matsuda (6):
>    RDMA/rxe: Make MR functions accessible from other rxe source code
>    RDMA/rxe: Move resp_states definition to rxe_verbs.h
>    RDMA/rxe: Add page invalidation support
>    RDMA/rxe: Allow registering MRs for On-Demand Paging
>    RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
>    RDMA/rxe: Add support for the traditional Atomic operations with ODP
> 
>   drivers/infiniband/sw/rxe/Makefile    |   2 +
>   drivers/infiniband/sw/rxe/rxe.c       |  18 ++
>   drivers/infiniband/sw/rxe/rxe.h       |  37 ----
>   drivers/infiniband/sw/rxe/rxe_loc.h   |  39 ++++
>   drivers/infiniband/sw/rxe/rxe_mr.c    |  34 +++-
>   drivers/infiniband/sw/rxe/rxe_odp.c   | 282 ++++++++++++++++++++++++++
>   drivers/infiniband/sw/rxe/rxe_resp.c  |  18 +-
>   drivers/infiniband/sw/rxe/rxe_verbs.c |   5 +-
>   drivers/infiniband/sw/rxe/rxe_verbs.h |  37 ++++
>   9 files changed, 419 insertions(+), 53 deletions(-)
>   create mode 100644 drivers/infiniband/sw/rxe/rxe_odp.c
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ