lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <OS7PR01MB11804464EB36E9E9FE02CA59BE5EDA@OS7PR01MB11804.jpnprd01.prod.outlook.com>
Date:   Fri, 8 Sep 2023 06:35:56 +0000
From:   "Daisuke Matsuda (Fujitsu)" <matsuda-daisuke@...itsu.com>
To:     'Jason Gunthorpe' <jgg@...dia.com>
CC:     "linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
        "leonro@...dia.com" <leonro@...dia.com>,
        "'leon@...nel.org'" <leon@...nel.org>,
        "zyjzyj2000@...il.com" <zyjzyj2000@...il.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "rpearsonhpe@...il.com" <rpearsonhpe@...il.com>,
        "Xiao Yang (Fujitsu)" <yangx.jy@...itsu.com>,
        "Zhijian Li (Fujitsu)" <lizhijian@...itsu.com>,
        "Yasunori Gotou (Fujitsu)" <y-goto@...itsu.com>
Subject: RE: [PATCH for-next v5 6/7] RDMA/rxe: Add support for
 Send/Recv/Write/Read with ODP

On Tue, June 13, 2023 1:23 AM Jason Gunthorpe wrote:
> On Thu, May 18, 2023 at 05:21:51PM +0900, Daisuke Matsuda wrote:
> 
> > +/* umem mutex must be locked before entering this function. */
> > +static int rxe_odp_map_range(struct rxe_mr *mr, u64 iova, int length, u32 flags)
> > +{
> > +	struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem);
> > +	const int max_tries = 3;
> > +	int cnt = 0;
> > +
> > +	int err;
> > +	u64 perm;
> > +	bool need_fault;
> > +
> > +	if (unlikely(length < 1)) {
> > +		mutex_unlock(&umem_odp->umem_mutex);
> > +		return -EINVAL;
> > +	}
> > +
> > +	perm = ODP_READ_ALLOWED_BIT;
> > +	if (!(flags & RXE_PAGEFAULT_RDONLY))
> > +		perm |= ODP_WRITE_ALLOWED_BIT;
> > +
> > +	/*
> > +	 * A successful return from rxe_odp_do_pagefault() does not guarantee
> > +	 * that all pages in the range became present. Recheck the DMA address
> > +	 * array, allowing max 3 tries for pagefault.
> > +	 */
> > +	while ((need_fault = rxe_is_pagefault_neccesary(umem_odp,
> > +							iova, length, perm))) {
> > +		if (cnt >= max_tries)
> > +			break;
> 
> I don't think this makes sense..

I have posted the new implementation, but it is somewhat different from
your suggestion. The reason is as below.

> 
> You need to make this work more like mlx5 does, you take the spinlock
> on the xarray, you search it for your index and whatever is there
> tells what to do. Hold the spinlock while doing the copy. This is
> enough locking for the fast path.

This lock should be umem mutex. Actual page-out is executed in the
kernel after rxe_ib_invalidate_range() is done. It is possible the spinlock
does not block this flow if it is locked while the invalidation is running.
The umem mutex can serialize these accesses.

> 
> If there is no index present, or it is not writable and you need
> write, then you unlock the spinlock and prefetch the missing entry and
> try again, this time also holding the mutex so there isn't a race.
> 
> You shouldn't probe into parts of the umem to discover information
> already stored in the xarray then do the same lookup into the xarray.
> 
> IIRC this also needs to keep track in the xarray on a per page basis
> if the page is writable.

An xarray entry can hold a pointer or a value from 0 to LONG_MAX.
That is not enough to store page address and its permission.

If we try to do everything with xarray, we need to allocate a new struct
for each page that holds a pointer to a page and a value to store r/w permission.
That is inefficient in terms of memory usage and implementation.

I think the xarray can be used to check presence of pages just like we have
been doing in the non-ODP case. On the other hand, the permission
should be fetched from umem_odp->pfn_list, which is updated everytime
page fault is executed.

Thanks,
Daisuke

> 
> Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ