netdev - Re: Memory providers multiplexing (Was: [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHS8izNMB-H3w0CE9kj6hT5q_F6_XJy_X_HtZwmisOEDhp31yg@mail.gmail.com>
Date: Sun, 16 Jul 2023 18:53:44 -0700
From: Mina Almasry <almasrymina@...gle.com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: Christian König <christian.koenig@....com>, 
	Hari Ramakrishnan <rharix@...gle.com>, David Ahern <dsahern@...nel.org>, 
	Samiullah Khawaja <skhawaja@...gle.com>, Willem de Bruijn <willemb@...gle.com>, 
	Jakub Kicinski <kuba@...nel.org>, Christoph Hellwig <hch@....de>, John Hubbard <jhubbard@...dia.com>, 
	Dan Williams <dan.j.williams@...el.com>, Jesper Dangaard Brouer <jbrouer@...hat.com>, brouer@...hat.com, 
	Alexander Duyck <alexander.duyck@...il.com>, Yunsheng Lin <linyunsheng@...wei.com>, davem@...emloft.net, 
	pabeni@...hat.com, netdev@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Lorenzo Bianconi <lorenzo@...nel.org>, Yisen Zhuang <yisen.zhuang@...wei.com>, 
	Salil Mehta <salil.mehta@...wei.com>, Eric Dumazet <edumazet@...gle.com>, 
	Sunil Goutham <sgoutham@...vell.com>, Geetha sowjanya <gakula@...vell.com>, 
	Subbaraya Sundeep <sbhatta@...vell.com>, hariprasad <hkelam@...vell.com>, 
	Saeed Mahameed <saeedm@...dia.com>, Leon Romanovsky <leon@...nel.org>, Felix Fietkau <nbd@....name>, 
	Ryder Lee <ryder.lee@...iatek.com>, Shayne Chen <shayne.chen@...iatek.com>, 
	Sean Wang <sean.wang@...iatek.com>, Kalle Valo <kvalo@...nel.org>, 
	Matthias Brugger <matthias.bgg@...il.com>, 
	AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>, 
	Jesper Dangaard Brouer <hawk@...nel.org>, Ilias Apalodimas <ilias.apalodimas@...aro.org>, 
	linux-rdma@...r.kernel.org, linux-wireless@...r.kernel.org, 
	linux-arm-kernel@...ts.infradead.org, linux-mediatek@...ts.infradead.org, 
	Jonathan Lemon <jonathan.lemon@...il.com>, logang@...tatee.com, 
	Bjorn Helgaas <bhelgaas@...gle.com>
Subject: Re: Memory providers multiplexing (Was: [PATCH net-next v4 4/5]
 page_pool: remove PP_FLAG_PAGE_FRAG flag)

On Fri, Jul 14, 2023 at 8:55 AM Jason Gunthorpe <jgg@...pe.ca> wrote:
>
> On Fri, Jul 14, 2023 at 07:55:15AM -0700, Mina Almasry wrote:
>
> > Once the skb frags with struct new_abstraction are in the TCP stack,
> > they will need some special handling in code accessing the frags. But
> > my RFC already addressed that somewhat because the frags were
> > inaccessible in that case. In this case the frags will be both
> > inaccessible and will not be struct pages at all (things like
> > get_page() will not work), so more special handling will be required,
> > maybe.
>
> It seems sort of reasonable, though there will be interesting concerns
> about coherence and synchronization with generial purpose DMABUFs that
> will need tackling.
>
> Still it is such a lot of churn and weridness in the netdev side, I
> think you'd do well to present an actual full application as
> justification.
>
> Yes, you showed you can stick unordered TCP data frags into GPU memory
> sort of quickly, but have you gone further with this to actually show
> it is useful for a real world GPU centric application?
>
> BTW your cover letter said 96% utilization, the usual server
> configuation is one NIC per GPU, so you were able to hit 1500Gb/sec of
> TCP BW with this?
>

I do notice that the number of NICs is missing from our public
documentation so far, so I will refrain from specifying how many NICs
are on those A3 VMs until the information is public. But I think I can
confirm that your general thinking is correct, the perf that we're
getting is 96.6% line rate of each GPU/NIC pair, and scales linearly
for each NIC/GPU pair we've tested with so far. Line rate of each
NIC/GPU pair is 200 Gb/sec.

So if we have 8 NIC/GPU pairs we'd be hitting 96.6% * 200 * 8 = 1545 GB/sec.
If we have, say, 2 NIC/GPU pairs, we'd be hitting 96.6% * 200 * 2 = 384 GB/sec
...
etc.

-- 
Thanks,
Mina