netdev - Memory providers multiplexing (Was: [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <eadebd58-d79a-30b6-87aa-1c77acb2ec17@redhat.com>
Date: Fri, 16 Jun 2023 22:42:35 +0200
From: Jesper Dangaard Brouer <jbrouer@...hat.com>
To: Jakub Kicinski <kuba@...nel.org>,
 Jesper Dangaard Brouer <jbrouer@...hat.com>
Cc: brouer@...hat.com, Alexander Duyck <alexander.duyck@...il.com>,
 Yunsheng Lin <linyunsheng@...wei.com>, davem@...emloft.net,
 pabeni@...hat.com, netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
 Lorenzo Bianconi <lorenzo@...nel.org>, Yisen Zhuang
 <yisen.zhuang@...wei.com>, Salil Mehta <salil.mehta@...wei.com>,
 Eric Dumazet <edumazet@...gle.com>, Sunil Goutham <sgoutham@...vell.com>,
 Geetha sowjanya <gakula@...vell.com>, Subbaraya Sundeep
 <sbhatta@...vell.com>, hariprasad <hkelam@...vell.com>,
 Saeed Mahameed <saeedm@...dia.com>, Leon Romanovsky <leon@...nel.org>,
 Felix Fietkau <nbd@....name>, Ryder Lee <ryder.lee@...iatek.com>,
 Shayne Chen <shayne.chen@...iatek.com>, Sean Wang <sean.wang@...iatek.com>,
 Kalle Valo <kvalo@...nel.org>, Matthias Brugger <matthias.bgg@...il.com>,
 AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>,
 Jesper Dangaard Brouer <hawk@...nel.org>,
 Ilias Apalodimas <ilias.apalodimas@...aro.org>, linux-rdma@...r.kernel.org,
 linux-wireless@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
 linux-mediatek@...ts.infradead.org, Jonathan Lemon <jonathan.lemon@...il.com>
Subject: Memory providers multiplexing (Was: [PATCH net-next v4 4/5]
 page_pool: remove PP_FLAG_PAGE_FRAG flag)

On 16/06/2023 21.21, Jakub Kicinski wrote:
> On Fri, 16 Jun 2023 20:59:12 +0200 Jesper Dangaard Brouer wrote:
>> +       if (mem_type == MEM_TYPE_PP_NETMEM)
>> +               pp_netmem_put_page(pp, page, allow_direct);
>> +       else
>> +               page_pool_put_full_page(pp, page, allow_direct);
> 
> Interesting, what is the netmem type? I was thinking about extending
> page pool for other mem providers and what came to mind was either
> optionally replacing the free / alloc with a function pointer:
> 
> https://github.com/torvalds/linux/commit/578ebda5607781c0abb26c1feae7ec8b83840768
> 
> or wrapping the PP calls with static inlines which can direct to
> a different implementation completely (like zctap / io_uring zc).
> 

I *LOVE* this idea!!!
It have been my master plan since day-1 to have other mem providers.
Notice how ZC xsk/AF_XDP have it's own memory allocator implementation.

The page_pool was never meant to be the final and best solution, I want
to see other, better and faster solutions competing with page_pool and
maybe some day replacing page_pool (I even see it as a success if PP get
depreciated and remove from the kernel due to a better solution).

See[1] how net/core/xdp.c simply have a switch statement
(is fast, because ASM wise it becomes a jump table):

  [1] 
https://github.com/torvalds/linux/blob/v6.4-rc6/net/core/xdp.c#L382-L402

> Former is better for huge pages, latter is better for IO mem
> (peer-to-peer DMA). I wonder if you have different use case which
> requires a different model :(
> 

I want for the network stack SKBs (and XDP) to support different memory
types for the "head" frame and "data-frags". Eric have described this
idea before, that hardware will do header-split, and we/he can get TCP
data part is another page/frag, making it faster for TCP-streams, but
this can be used for much more.

My proposed use-cases involves more that TCP.  We can easily imagine
NVMe protocol header-split, and the data-frag could be a mem_type that
actually belongs to the harddisk (maybe CPU cannot even read this).  The
same scenario goes for GPU memory, which is for the AI use-case.  IIRC
then Jonathan have previously send patches for the GPU use-case.

I really hope we can work in this direction together,
--Jesper