lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <64acfc1e46a80_8e17829462@dwillia2-xfh.jf.intel.com.notmuch>
Date:   Mon, 10 Jul 2023 23:52:14 -0700
From:   Dan Williams <dan.j.williams@...el.com>
To:     Jason Gunthorpe <jgg@...pe.ca>,
        Mina Almasry <almasrymina@...gle.com>,
        Christoph Hellwig <hch@....de>,
        John Hubbard <jhubbard@...dia.com>,
        "Dan Williams" <dan.j.williams@...el.com>
CC:     David Ahern <dsahern@...nel.org>, Jakub Kicinski <kuba@...nel.org>,
        "Jesper Dangaard Brouer" <jbrouer@...hat.com>, <brouer@...hat.com>,
        Alexander Duyck <alexander.duyck@...il.com>,
        Yunsheng Lin <linyunsheng@...wei.com>, <davem@...emloft.net>,
        <pabeni@...hat.com>, <netdev@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>,
        Lorenzo Bianconi <lorenzo@...nel.org>,
        "Yisen Zhuang" <yisen.zhuang@...wei.com>,
        Salil Mehta <salil.mehta@...wei.com>,
        "Eric Dumazet" <edumazet@...gle.com>,
        Sunil Goutham <sgoutham@...vell.com>,
        "Geetha sowjanya" <gakula@...vell.com>,
        Subbaraya Sundeep <sbhatta@...vell.com>,
        hariprasad <hkelam@...vell.com>,
        Saeed Mahameed <saeedm@...dia.com>,
        "Leon Romanovsky" <leon@...nel.org>, Felix Fietkau <nbd@....name>,
        Ryder Lee <ryder.lee@...iatek.com>,
        Shayne Chen <shayne.chen@...iatek.com>,
        Sean Wang <sean.wang@...iatek.com>,
        Kalle Valo <kvalo@...nel.org>,
        Matthias Brugger <matthias.bgg@...il.com>,
        AngeloGioacchino Del Regno 
        <angelogioacchino.delregno@...labora.com>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        Ilias Apalodimas <ilias.apalodimas@...aro.org>,
        <linux-rdma@...r.kernel.org>, <linux-wireless@...r.kernel.org>,
        <linux-arm-kernel@...ts.infradead.org>,
        <linux-mediatek@...ts.infradead.org>,
        Jonathan Lemon <jonathan.lemon@...il.com>
Subject: Re: Memory providers multiplexing (Was: [PATCH net-next v4 4/5]
 page_pool: remove PP_FLAG_PAGE_FRAG flag)

Jason Gunthorpe wrote:
> On Mon, Jul 10, 2023 at 04:02:59PM -0700, Mina Almasry wrote:
> > On Mon, Jul 10, 2023 at 10:44 AM Jason Gunthorpe <jgg@...pe.ca> wrote:
> > >
> > > On Wed, Jul 05, 2023 at 06:17:39PM -0700, Mina Almasry wrote:
> > >
> > > > Another issue is that in networks with low MTU, we could be DMAing
> > > > 1400/1500 bytes into each allocation, which is problematic if the
> > > > allocation is 8K+. I would need to investigate a bit to see if/how to
> > > > solve that, and we may end up having to split the page and again run
> > > > into the 'not enough room in struct page' problem.
> > >
> > > You don't have an intree driver to use this with, so who knows, but
> > > the out of tree GPU drivers tend to use a 64k memory management page
> > > size, and I don't expect you'd make progress with a design where a 64K
> > > naturaly sized allocator is producing 4k/8k non-compound pages just
> > > for netdev. We are still struggling with pagemap support for variable
> > > page size folios, so there is a bunch of technical blockers before
> > > drivers could do this.
> > >
> > > This is why it is so important to come with a complete in-tree
> > > solution, as we cannot review this design if your work is done with
> > > hacked up out of tree drivers.
> > >
> > 
> > I think you're assuming the proposal requires dma-buf exporter driver
> > changes, and I have a 'hacked up out of tree driver' not visible to
> > you.
> 
> Oh, I thought it was obvious what you did in patch 1 was a total
> non-starter when I said you can't abuse the ZONE_DEVICE pages like
> this.
> 
> You must create ZONE_DEVICE P2P pages, not MEMORY_DEVICE_PRIVATE to
> represent P2P memory, and you can't do that automatically from the
> dmabuf core code.
> 
> Without doing this the DMA API doesn't actually work properly because
> it doesn't have enough information to know about what the underlying
> exporter is.
> 
> The entire point of DEVICE_PRIVATE is that the page content, and
> access to the page's physical location, is *explicitly* unavailable to
> anyone but the pgmap owner.
> 
> > > Fully and properly adding P2P ZONE_DEVICE to a real world driver is a
> > > pretty big ask still.
> > 
> > There is no such ask.
> 
> Well, there is from me if you want to use stuct pages as handles for
> P2P memory. That is what we have defined in the pgmap area.
> 
> Also I should warn you that your 'option 2' is being NAK'd by
> Christoph for now, we are not adding any new code around DMABUF's
> hacky use of NULL sg_page scatterlist for P2P memory either. I've been
> working on solutions here but it is slow going.
> 
> > On dma-buf changes required. I do need approval from the dma-buf
> > maintainers,
> 
> It is a neat hack, of sorts, to abuse DEVICE_PRIVATE to create struct
> pages for the exclusive use of pagepool - but you need more approval
> than just dmabuf maintainers to abuse the pgmap framework like
> this.
> 
> At least from my position I want to see MEMORY_DEVICE_PCI_P2PDMA used
> to represent P2P memory. You haven't CC'd anyone from the mm community
> so I've added some people here to see if there are other opinions.
> 
> To be clear, you need an explicit ack from mm people on the abusive
> use of pgmap in patch 1.

This thread is long so I am only reacting to this first message I am
copied on, but yes agree with Jason anything peer-to-peer DMA needs to
reuse p2pdma and it definitely needs in-tree producers and consumers for
all infrastructure.

One piece of technical debt standing in the way of any proposed
expansion of pgmap usage is the final resolution of this topic:

https://lore.kernel.org/all/166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com/

I am also generally reticent to entertain taking on new pgmap
maintenance. I.e. now that accelerator memory attached over a coherent
link is an open hardware standard (CXL) that addresses what pgmap
infrastructure enabled in software.

> I know it is not what you want to hear, but there are actual reasons
> why the P2P DMA problem has been festering for so long, and hacky
> quick fixes like this are not going to be enough..
> 
> Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ