lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 8 Feb 2024 22:04:22 +0000
From: Konstantin Taranov <kotaranov@...rosoft.com>
To: Jason Gunthorpe <jgg@...pe.ca>
CC: Long Li <longli@...rosoft.com>, Konstantin Taranov
	<kotaranov@...ux.microsoft.com>, "leon@...nel.org" <leon@...nel.org>,
	"linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH rdma-next v1 1/1] RDMA/mana_ib: Fix bug in creation of dma
 regions

> From: Jason Gunthorpe <jgg@...pe.ca>
> On Thu, Feb 08, 2024 at 06:53:05PM +0000, Konstantin Taranov wrote:
> > > From: Long Li <longli@...rosoft.com>
> > > Sent: Thursday, 8 February 2024 19:43
> > > To: Konstantin Taranov <kotaranov@...ux.microsoft.com>; Konstantin
> > > Taranov <kotaranov@...rosoft.com>; sharmaajay@...rosoft.com;
> > > jgg@...pe.ca; leon@...nel.org
> > > Cc: linux-rdma@...r.kernel.org; linux-kernel@...r.kernel.org
> > > Subject: RE: [PATCH rdma-next v1 1/1] RDMA/mana_ib: Fix bug in
> > > creation of dma regions
> > >
> > > >
> > > >   /* Hardware requires dma region to align to chosen page size */
> > > > - page_sz = ib_umem_find_best_pgsz(umem, PAGE_SZ_BM, 0);
> > > > + page_sz = ib_umem_find_best_pgsz(umem, PAGE_SZ_BM, virt);
> > > >   if (!page_sz) {
> > > >           ibdev_dbg(&dev->ib_dev, "failed to find page size.\n");
> > > >           return -ENOMEM;
> > > >   }
> > >
> > > How about doing:
> > > page_sz = ib_umem_find_best_pgsz(umem, PAGE_SZ_BM,
> force_zero_offset
> > > ? 0 : virt);
> > >
> > > Will this work? This can get rid of the following while loop.
> > >
> >
> > I do not think so. I mentioned once, that it was failing for me with
> > existing code with the 4K-aligned addresses and 8K pages. In this
> > case, we miscalculate the number of pages. So, we think that it is one 8K
> page, but it is in fact two.
> 
> That is a confusing statement.. What is "we" here?

Sorry, I meant here the helper thinks that it is one 8K page for 8K buffer.
But the offset will not not zero when the buffer is only 4K aligned. So the actual
Number of pages is 2, but the helper thinks that it is one because of wrong virt. 

> 
> ib_umem_dma_offset() is not always guaranteed to be zero, with a 0 iova.
> With higher order pages the offset can be within the page, it generates
> 
>   offset = IOVA % pgsz
> 
> There are a couple places that do want the offset to be fixed to zero and have
> the loop, at this point it would be good to consolidate them into some
> common ib_umem_find_best_pgsz_zero_offset() or something.
> 
> > > > +
> > > > + if (force_zero_offset) {
> > > > +         while (ib_umem_dma_offset(umem, page_sz) && page_sz >
> > > > PAGE_SIZE)
> > > > +                 page_sz /= 2;
> > > > +         if (ib_umem_dma_offset(umem, page_sz) != 0) {
> > > > +                 ibdev_dbg(&dev->ib_dev, "failed to find page
> > > > + size to
> > > > force zero offset.\n");
> > > > +                 return -ENOMEM;
> > > > +         }
> > > > + }
> > > > +
> 
> Yes this doesn't look quite right..
> 
> It should flow from the HW capability, the helper you call should be tightly
> linked to what the HW can do.
> 
> ib_umem_find_best_pgsz() is used for MRs that have the usual
>   offset = IOVA % pgsz
> 
> We've always created other helpers for other restrictions.
> 
> So you should move your "force_zero_offset" into another helper and
> describe exactly how the HW works to support the calculation
> 
> It is odd to have the offset loop and be using
> ib_umem_find_best_pgsz() with some iova, usually you'd use
> ib_umem_find_best_pgoff() in those cases, see the other callers.

Hi Jason,
Thanks for the comments.

To be honest, I do not understand how I could employ ib_umem_find_best_pgoff
for my purpose. As well as I do not see any mistake in the patch, and I think you neither.

I can explain the logic behind the code, and could you say what I need to change to get
this patch accepted?

Our HW cannot create a work queue or a completion queue from a dma region that has
non zero offset. As a result, applications must allocate at least 4K-aligned memory for
such queues. As a queue can be long, the helper functions may suggest large page sizes
(let's call it X), but compromise the zero offset. As we see that the offset is not zero, we
half X till first page size that can offer us zero offset. This page size will be at least 4K.
If we cannot find such page size, it means that the user did not provide 4K-aligned buffer.

I can make a special helper, but I do not think that it will be useful to anyone. Plus,
there is no better approach then halving the page size, so the helper will end up with that
loop under the hood. As I see mlnx also uses a loop with halving page_sz, but for a different
purpose, I do not see why our code cannot do the same without a special helper.

> 
> Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ