[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <64acfc1e46a80_8e17829462@dwillia2-xfh.jf.intel.com.notmuch>
Date: Mon, 10 Jul 2023 23:52:14 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Jason Gunthorpe <jgg@...pe.ca>, Mina Almasry <almasrymina@...gle.com>,
Christoph Hellwig <hch@....de>, John Hubbard <jhubbard@...dia.com>, "Dan
Williams" <dan.j.williams@...el.com>
CC: David Ahern <dsahern@...nel.org>, Jakub Kicinski <kuba@...nel.org>,
"Jesper Dangaard Brouer" <jbrouer@...hat.com>, <brouer@...hat.com>, Alexander
Duyck <alexander.duyck@...il.com>, Yunsheng Lin <linyunsheng@...wei.com>,
<davem@...emloft.net>, <pabeni@...hat.com>, <netdev@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, Lorenzo Bianconi <lorenzo@...nel.org>, "Yisen
Zhuang" <yisen.zhuang@...wei.com>, Salil Mehta <salil.mehta@...wei.com>,
"Eric Dumazet" <edumazet@...gle.com>, Sunil Goutham <sgoutham@...vell.com>,
"Geetha sowjanya" <gakula@...vell.com>, Subbaraya Sundeep
<sbhatta@...vell.com>, hariprasad <hkelam@...vell.com>, Saeed Mahameed
<saeedm@...dia.com>, "Leon Romanovsky" <leon@...nel.org>, Felix Fietkau
<nbd@....name>, Ryder Lee <ryder.lee@...iatek.com>, Shayne Chen
<shayne.chen@...iatek.com>, Sean Wang <sean.wang@...iatek.com>, Kalle Valo
<kvalo@...nel.org>, Matthias Brugger <matthias.bgg@...il.com>,
AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>, Jesper
Dangaard Brouer <hawk@...nel.org>, Ilias Apalodimas
<ilias.apalodimas@...aro.org>, <linux-rdma@...r.kernel.org>,
<linux-wireless@...r.kernel.org>, <linux-arm-kernel@...ts.infradead.org>,
<linux-mediatek@...ts.infradead.org>, Jonathan Lemon
<jonathan.lemon@...il.com>
Subject: Re: Memory providers multiplexing (Was: [PATCH net-next v4 4/5]
page_pool: remove PP_FLAG_PAGE_FRAG flag)
Jason Gunthorpe wrote:
> On Mon, Jul 10, 2023 at 04:02:59PM -0700, Mina Almasry wrote:
> > On Mon, Jul 10, 2023 at 10:44 AM Jason Gunthorpe <jgg@...pe.ca> wrote:
> > >
> > > On Wed, Jul 05, 2023 at 06:17:39PM -0700, Mina Almasry wrote:
> > >
> > > > Another issue is that in networks with low MTU, we could be DMAing
> > > > 1400/1500 bytes into each allocation, which is problematic if the
> > > > allocation is 8K+. I would need to investigate a bit to see if/how to
> > > > solve that, and we may end up having to split the page and again run
> > > > into the 'not enough room in struct page' problem.
> > >
> > > You don't have an intree driver to use this with, so who knows, but
> > > the out of tree GPU drivers tend to use a 64k memory management page
> > > size, and I don't expect you'd make progress with a design where a 64K
> > > naturaly sized allocator is producing 4k/8k non-compound pages just
> > > for netdev. We are still struggling with pagemap support for variable
> > > page size folios, so there is a bunch of technical blockers before
> > > drivers could do this.
> > >
> > > This is why it is so important to come with a complete in-tree
> > > solution, as we cannot review this design if your work is done with
> > > hacked up out of tree drivers.
> > >
> >
> > I think you're assuming the proposal requires dma-buf exporter driver
> > changes, and I have a 'hacked up out of tree driver' not visible to
> > you.
>
> Oh, I thought it was obvious what you did in patch 1 was a total
> non-starter when I said you can't abuse the ZONE_DEVICE pages like
> this.
>
> You must create ZONE_DEVICE P2P pages, not MEMORY_DEVICE_PRIVATE to
> represent P2P memory, and you can't do that automatically from the
> dmabuf core code.
>
> Without doing this the DMA API doesn't actually work properly because
> it doesn't have enough information to know about what the underlying
> exporter is.
>
> The entire point of DEVICE_PRIVATE is that the page content, and
> access to the page's physical location, is *explicitly* unavailable to
> anyone but the pgmap owner.
>
> > > Fully and properly adding P2P ZONE_DEVICE to a real world driver is a
> > > pretty big ask still.
> >
> > There is no such ask.
>
> Well, there is from me if you want to use stuct pages as handles for
> P2P memory. That is what we have defined in the pgmap area.
>
> Also I should warn you that your 'option 2' is being NAK'd by
> Christoph for now, we are not adding any new code around DMABUF's
> hacky use of NULL sg_page scatterlist for P2P memory either. I've been
> working on solutions here but it is slow going.
>
> > On dma-buf changes required. I do need approval from the dma-buf
> > maintainers,
>
> It is a neat hack, of sorts, to abuse DEVICE_PRIVATE to create struct
> pages for the exclusive use of pagepool - but you need more approval
> than just dmabuf maintainers to abuse the pgmap framework like
> this.
>
> At least from my position I want to see MEMORY_DEVICE_PCI_P2PDMA used
> to represent P2P memory. You haven't CC'd anyone from the mm community
> so I've added some people here to see if there are other opinions.
>
> To be clear, you need an explicit ack from mm people on the abusive
> use of pgmap in patch 1.
This thread is long so I am only reacting to this first message I am
copied on, but yes agree with Jason anything peer-to-peer DMA needs to
reuse p2pdma and it definitely needs in-tree producers and consumers for
all infrastructure.
One piece of technical debt standing in the way of any proposed
expansion of pgmap usage is the final resolution of this topic:
https://lore.kernel.org/all/166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com/
I am also generally reticent to entertain taking on new pgmap
maintenance. I.e. now that accelerator memory attached over a coherent
link is an open hardware standard (CXL) that addresses what pgmap
infrastructure enabled in software.
> I know it is not what you want to hear, but there are actual reasons
> why the P2P DMA problem has been festering for so long, and hacky
> quick fixes like this are not going to be enough..
>
> Jason
Powered by blists - more mailing lists