linux-kernel - Re: Memory providers multiplexing (Was: [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <765b02a5-2f09-e744-f441-c082fa3987ff@kernel.org>
Date:   Sun, 16 Jul 2023 21:08:16 -0600
From:   David Ahern <dsahern@...nel.org>
To:     Mina Almasry <almasrymina@...gle.com>
Cc:     Christian König <christian.koenig@....com>,
        Hari Ramakrishnan <rharix@...gle.com>,
        Jason Gunthorpe <jgg@...pe.ca>,
        Samiullah Khawaja <skhawaja@...gle.com>,
        Willem de Bruijn <willemb@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Christoph Hellwig <hch@....de>,
        John Hubbard <jhubbard@...dia.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Jesper Dangaard Brouer <jbrouer@...hat.com>,
        brouer@...hat.com, Alexander Duyck <alexander.duyck@...il.com>,
        Yunsheng Lin <linyunsheng@...wei.com>, davem@...emloft.net,
        pabeni@...hat.com, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        Lorenzo Bianconi <lorenzo@...nel.org>,
        Yisen Zhuang <yisen.zhuang@...wei.com>,
        Salil Mehta <salil.mehta@...wei.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Sunil Goutham <sgoutham@...vell.com>,
        Geetha sowjanya <gakula@...vell.com>,
        Subbaraya Sundeep <sbhatta@...vell.com>,
        hariprasad <hkelam@...vell.com>,
        Saeed Mahameed <saeedm@...dia.com>,
        Leon Romanovsky <leon@...nel.org>,
        Felix Fietkau <nbd@....name>,
        Ryder Lee <ryder.lee@...iatek.com>,
        Shayne Chen <shayne.chen@...iatek.com>,
        Sean Wang <sean.wang@...iatek.com>,
        Kalle Valo <kvalo@...nel.org>,
        Matthias Brugger <matthias.bgg@...il.com>,
        AngeloGioacchino Del Regno 
        <angelogioacchino.delregno@...labora.com>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        Ilias Apalodimas <ilias.apalodimas@...aro.org>,
        linux-rdma@...r.kernel.org, linux-wireless@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org,
        linux-mediatek@...ts.infradead.org,
        Jonathan Lemon <jonathan.lemon@...il.com>, logang@...tatee.com,
        Bjorn Helgaas <bhelgaas@...gle.com>
Subject: Re: Memory providers multiplexing (Was: [PATCH net-next v4 4/5]
 page_pool: remove PP_FLAG_PAGE_FRAG flag)

On 7/16/23 8:05 PM, Mina Almasry wrote:
>>
>> For the driver and hardware queue: don't you need a dedicated queue for
>> the flow(s) in question?
> 
> In the RFC and the implementation I'm thinking of, the queue is
> 'dedicated' in that each queue will be a devmem TCP queue or a regular
> queue. devmem queues generate devmem skbs and non-devmem queues
> generate non-devmem skbs. We support switching queues between devmem
> mode and non-devmem mode via a uapi.

ethtool APIs or something else?

> 
>> If not, how can you properly handle the
>> teardown case (e.g., app crashes and you need to ensure all references
>> to GPU memory are removed from NIC descriptors)?
> 
> Jason and Christian will correct me if I'm wrong, but AFAICT the
> dma-buf API requires the dma-buf provider to keep the attachment
> mapping alive as long as the importer requires it. The dma-buf API
> gives the importer dma_buf_map_attachment() and
> dma_buf_unmap_attachment() APIs, but there is no callback for the
> exporter to inform the importer that it has to take the mapping away.

Isn't the importer that application that terminated (cleanly or other)?
That was my thinking but I guess there are other designs that can cross
a single application.

> The closest thing I saw was the move_notify() callback, but that is
> optional.
> 
> In my mind the way it works is that there will be some uapi that binds
> a dma-buf to an RX queue, that will create the attachment and the
> mapping. If the user crashes or closes the dma-buf handle then that
> will unbind the dma-buf from the RX queue, but the mapping will remain
> alive (via some refcounting) until all the NIC descriptors are freed
> and the mapping is not under use anymore. Usually this will happen
> next driver reset which destroys and recreates rx queues thereby
> freeing all the NIC descriptors (but could be a new API so that we
> don't rely on a driver reset).
> 
>> If you agree on this
>> point, then you can require the dedicated queue management in the driver
>> to use and expect only the alternative frag addressing scheme. ie., it
>> knows the address is not struct page (validates by checking skb flag or
>> frag flag or address magic), but a reference to say a page_pool entry
>> (if you are using page_pool for management of the dmabuf slices) which
>> contains the metadata needed for the use case.
> 
> Honestly if my understanding above doesn't match what you want, I
> could implement 'dedicated queues' instead, just let me know what you
> want at some future iteration. Now, I'm more worried about this memory
> format issue and I'm working on an RX prototype without struct pages.
> So far purely technically speaking it seems possible.
> 
> 

My comment was only a suggestion on how to simplify driver changes. ie.,
a queue is either pages (based on standard page_pool or alloc_pages) or
some "special" page_pool (ie., new abstraction) but not mixed. In that
case it knows how to handle the overloaded 'address' in skb_frag in a
clean manner.