[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAKMK7uG5UOS5360_HjJyroLE8b+6wrhT291PaqjFbii+BT7+Hg@mail.gmail.com>
Date: Wed, 7 Oct 2020 10:15:31 +0200
From: Daniel Vetter <daniel@...ll.ch>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: Leon Romanovsky <leon@...nel.org>,
Doug Ledford <dledford@...hat.com>,
Leon Romanovsky <leonro@...dia.com>,
Christoph Hellwig <hch@....de>,
David Airlie <airlied@...ux.ie>,
dri-devel <dri-devel@...ts.freedesktop.org>,
intel-gfx <intel-gfx@...ts.freedesktop.org>,
Jani Nikula <jani.nikula@...ux.intel.com>,
Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-rdma <linux-rdma@...r.kernel.org>,
Maor Gottlieb <maorg@...dia.com>,
Rodrigo Vivi <rodrigo.vivi@...el.com>,
Roland Scheidegger <sroland@...are.com>,
Tvrtko Ursulin <tvrtko.ursulin@...el.com>,
VMware Graphics <linux-graphics-maintainer@...are.com>
Subject: Re: [PATCH rdma-next v5 0/4] Dynamicaly allocate SG table from the pages
On Wed, Oct 7, 2020 at 9:22 AM Jason Gunthorpe <jgg@...pe.ca> wrote:
> On Tue, Oct 06, 2020 at 12:41:22PM +0200, Daniel Vetter wrote:
> > On Mon, Oct 05, 2020 at 08:56:50PM -0300, Jason Gunthorpe wrote:
> > > On Sun, Oct 04, 2020 at 06:43:36PM +0300, Leon Romanovsky wrote:
> > > > This series extends __sg_alloc_table_from_pages to allow chaining of
> > > > new pages to already initialized SG table.
> > > >
> > > > This allows for the drivers to utilize the optimization of merging contiguous
> > > > pages without a need to pre allocate all the pages and hold them in
> > > > a very large temporary buffer prior to the call to SG table initialization.
> > > >
> > > > The second patch changes the Infiniband driver to use the new API. It
> > > > removes duplicate functionality from the code and benefits the
> > > > optimization of allocating dynamic SG table from pages.
> > > >
> > > > In huge pages system of 2MB page size, without this change, the SG table
> > > > would contain x512 SG entries.
> > > > E.g. for 100GB memory registration:
> > > >
> > > > Number of entries Size
> > > > Before 26214400 600.0MB
> > > > After 51200 1.2MB
> > > >
> > > > Thanks
> > > >
> > > > Maor Gottlieb (2):
> > > > lib/scatterlist: Add support in dynamic allocation of SG table from
> > > > pages
> > > > RDMA/umem: Move to allocate SG table from pages
> > > >
> > > > Tvrtko Ursulin (2):
> > > > tools/testing/scatterlist: Rejuvenate bit-rotten test
> > > > tools/testing/scatterlist: Show errors in human readable form
> > >
> > > This looks OK, I'm going to send it into linux-next on the hmm tree
> > > for awhile to see if anything gets broken. If there is more
> > > remarks/tags/etc please continue
> >
> > An idea that just crossed my mind: A pin_user_pages_sgt might be useful
> > for both rdma and drm, since this would avoid the possible huge interim
> > struct pages array for thp pages. Or anything else that could be coalesced
> > down into a single sg entry.
> >
> > Not sure it's worth it, but would at least give a slightly neater
> > interface I think.
>
> We've talked about it. Christoph wants to see this area move to a biovec
> interface instead of sgl, but it might still be worthwhile to have an
> interm step at least as an API consolidation.
Hm but then we'd need a new struct for the mapped side of things
(which would still be what you get from dma-buf). That would be quite
a bit of work to roll out everywhere, and sgt isn't such a huge misfit
for passing buffer object mappings and system memory backing storage
around, and hence what we (very slowly) converging drivers/gpu towards
over the past 10 years or so.
And moving the dma_map step out of dma-buf doesn't work, because some
of the use-cases we have is for very special iommus which are managed
by the gpu driver directly. Stuff that e.g. rotates/retiles/compresses
on the fly, and is accessible by other (gfx related like video code,
camera, ..) devices. Not something I expect to ever be relevant for
rdma since this exist mostly on some small soc, but it's a thing.
Without that dma-buf could hand out biovec for struct_page backed
stuff, or some pfn_vec for the p2p stuff.
Anyway was just an idea, I guess we'll have to live with some
impedance mismatch since rolling out the one an only iovec structure
which suits everyone is I think impossible :-)
> Avoiding the page list would be complicated as we'd somehow have to
> code share the page table iterator scheme.
We're (slowly) getting towards thp for vram mappings and everything so
I guess for drivers/gpu we might make that happen. But yeah it'd be
not so pretty I think.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Powered by blists - more mailing lists