lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 9 Sep 2022 02:45:46 +0000
From:   "matsuda-daisuke@...itsu.com" <matsuda-daisuke@...itsu.com>
To:     'Leon Romanovsky' <leonro@...dia.com>
CC:     "linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
        "jgg@...dia.com" <jgg@...dia.com>,
        "zyjzyj2000@...il.com" <zyjzyj2000@...il.com>,
        "nvdimm@...ts.linux.dev" <nvdimm@...ts.linux.dev>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "rpearsonhpe@...il.com" <rpearsonhpe@...il.com>,
        "yangx.jy@...itsu.com" <yangx.jy@...itsu.com>,
        "lizhijian@...itsu.com" <lizhijian@...itsu.com>,
        "y-goto@...itsu.com" <y-goto@...itsu.com>
Subject: Re: [RFC PATCH 6/7] RDMA/rxe: Add support for Send/Recv/Write/Read
 operations with ODP

On Thu, Sep 8, 2022 5:30 PM Leon Romanovsky wrote:
> On Wed, Sep 07, 2022 at 11:43:04AM +0900, Daisuke Matsuda wrote:
> > rxe_mr_copy() is used widely to copy data to/from a user MR. requester uses
> > it to load payloads of requesting packets; responder uses it to process
> > Send, Write, and Read operaetions; completer uses it to copy data from
> > response packets of Read and Atomic operations to a user MR.
> >
> > Allow these operations to be used with ODP by adding a counterpart function
> > rxe_odp_mr_copy(). It is comprised of the following steps:
> >  1. Check the driver page table(umem_odp->dma_list) to see if pages being
> >     accessed are present with appropriate permission.
> >  2. If necessary, trigger page fault to map the pages.
> >  3. Convert their user space addresses to kernel logical addresses using
> >     PFNs in the driver page table(umem_odp->pfn_list).
> >  4. Execute data copy fo/from the pages.
> >
> > umem_mutex is used to ensure that dma_list (an array of addresses of an MR)
> > is not changed while it is checked and that mapped pages are not
> > invalidated before data copy completes.
> >
> > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@...itsu.com>
> > ---
> >  drivers/infiniband/sw/rxe/rxe.c      |  10 ++
> >  drivers/infiniband/sw/rxe/rxe_loc.h  |   2 +
> >  drivers/infiniband/sw/rxe/rxe_mr.c   |   2 +-
> >  drivers/infiniband/sw/rxe/rxe_odp.c  | 173 +++++++++++++++++++++++++++
> >  drivers/infiniband/sw/rxe/rxe_resp.c |   6 +-
> >  5 files changed, 190 insertions(+), 3 deletions(-)
> 
> <...>
> 
> > +/* umem mutex is always locked when returning from this function. */
> > +static int rxe_odp_map_range(struct rxe_mr *mr, u64 iova, int length, u32 flags)
> > +{
> > +	struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem);
> > +	const int max_tries = 3;
> > +	int cnt = 0;
> > +
> > +	int err;
> > +	u64 perm;
> > +	bool need_fault;
> > +
> > +	if (unlikely(length < 1))
> > +		return -EINVAL;
> > +
> > +	perm = ODP_READ_ALLOWED_BIT;
> > +	if (!(flags & RXE_PAGEFAULT_RDONLY))
> > +		perm |= ODP_WRITE_ALLOWED_BIT;
> > +
> > +	mutex_lock(&umem_odp->umem_mutex);
> > +
> > +	/*
> > +	 * A successful return from rxe_odp_do_pagefault() does not guarantee
> > +	 * that all pages in the range became present. Recheck the DMA address
> > +	 * array, allowing max 3 tries for pagefault.
> > +	 */
> > +	while ((need_fault = rxe_is_pagefault_neccesary(umem_odp,
> > +							iova, length, perm))) {
> > +
> > +		if (cnt >= max_tries)
> > +			break;
> > +
> > +		mutex_unlock(&umem_odp->umem_mutex);
> > +
> > +		/* rxe_odp_do_pagefault() locks the umem mutex. */
> 
> Maybe it is correct and safe to release lock in the middle, but it is
> not clear. The whole pattern of taking lock in one function and later
> releasing it in another doesn't look right to me.

When the driver finds the pages are not mapped in rxe_is_pagefault_neccesary(),
it releases the lock to let the kernel execute page invalidation meantime,
and takes the lock again to do page fault in ib_umem_odp_map_dma_and_lock().
Then, it proceed to rxe_is_pagefault_neccesary() again with the lock taken.

I admit the usage of the lock is quite confusing. 
It is locked before making it clear that the target pages are present.
It is released when the target pages are missing and page fault is required,
or when access to the target pages in a MR is done.

I will move some lock taking/releasing operations to rxe_odp_mr_copy()
and rxe_odp_atomic_ops() so that people can understand the situation easier.
Also, I will rethink the way I explain it in comments and the patch description.

Thank you,
Daisuke Matsuda

> 
> Thanks

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ