linux-kernel - Re: Memory providers multiplexing (Was: [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHS8izNMB-H3w0CE9kj6hT5q_F6_XJy_X_HtZwmisOEDhp31yg@mail.gmail.com>
Date:   Sun, 16 Jul 2023 18:53:44 -0700
From:   Mina Almasry <almasrymina@...gle.com>
To:     Jason Gunthorpe <jgg@...pe.ca>
Cc:     Christian König <christian.koenig@....com>,
        Hari Ramakrishnan <rharix@...gle.com>,
        David Ahern <dsahern@...nel.org>,
        Samiullah Khawaja <skhawaja@...gle.com>,
        Willem de Bruijn <willemb@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Christoph Hellwig <hch@....de>,
        John Hubbard <jhubbard@...dia.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Jesper Dangaard Brouer <jbrouer@...hat.com>,
        brouer@...hat.com, Alexander Duyck <alexander.duyck@...il.com>,
        Yunsheng Lin <linyunsheng@...wei.com>, davem@...emloft.net,
        pabeni@...hat.com, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        Lorenzo Bianconi <lorenzo@...nel.org>,
        Yisen Zhuang <yisen.zhuang@...wei.com>,
        Salil Mehta <salil.mehta@...wei.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Sunil Goutham <sgoutham@...vell.com>,
        Geetha sowjanya <gakula@...vell.com>,
        Subbaraya Sundeep <sbhatta@...vell.com>,
        hariprasad <hkelam@...vell.com>,
        Saeed Mahameed <saeedm@...dia.com>,
        Leon Romanovsky <leon@...nel.org>,
        Felix Fietkau <nbd@....name>,
        Ryder Lee <ryder.lee@...iatek.com>,
        Shayne Chen <shayne.chen@...iatek.com>,
        Sean Wang <sean.wang@...iatek.com>,
        Kalle Valo <kvalo@...nel.org>,
        Matthias Brugger <matthias.bgg@...il.com>,
        AngeloGioacchino Del Regno 
        <angelogioacchino.delregno@...labora.com>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        Ilias Apalodimas <ilias.apalodimas@...aro.org>,
        linux-rdma@...r.kernel.org, linux-wireless@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org,
        linux-mediatek@...ts.infradead.org,
        Jonathan Lemon <jonathan.lemon@...il.com>, logang@...tatee.com,
        Bjorn Helgaas <bhelgaas@...gle.com>
Subject: Re: Memory providers multiplexing (Was: [PATCH net-next v4 4/5]
 page_pool: remove PP_FLAG_PAGE_FRAG flag)

On Fri, Jul 14, 2023 at 8:55 AM Jason Gunthorpe <jgg@...pe.ca> wrote:
>
> On Fri, Jul 14, 2023 at 07:55:15AM -0700, Mina Almasry wrote:
>
> > Once the skb frags with struct new_abstraction are in the TCP stack,
> > they will need some special handling in code accessing the frags. But
> > my RFC already addressed that somewhat because the frags were
> > inaccessible in that case. In this case the frags will be both
> > inaccessible and will not be struct pages at all (things like
> > get_page() will not work), so more special handling will be required,
> > maybe.
>
> It seems sort of reasonable, though there will be interesting concerns
> about coherence and synchronization with generial purpose DMABUFs that
> will need tackling.
>
> Still it is such a lot of churn and weridness in the netdev side, I
> think you'd do well to present an actual full application as
> justification.
>
> Yes, you showed you can stick unordered TCP data frags into GPU memory
> sort of quickly, but have you gone further with this to actually show
> it is useful for a real world GPU centric application?
>
> BTW your cover letter said 96% utilization, the usual server
> configuation is one NIC per GPU, so you were able to hit 1500Gb/sec of
> TCP BW with this?
>

I do notice that the number of NICs is missing from our public
documentation so far, so I will refrain from specifying how many NICs
are on those A3 VMs until the information is public. But I think I can
confirm that your general thinking is correct, the perf that we're
getting is 96.6% line rate of each GPU/NIC pair, and scales linearly
for each NIC/GPU pair we've tested with so far. Line rate of each
NIC/GPU pair is 200 Gb/sec.

So if we have 8 NIC/GPU pairs we'd be hitting 96.6% * 200 * 8 = 1545 GB/sec.
If we have, say, 2 NIC/GPU pairs, we'd be hitting 96.6% * 200 * 2 = 384 GB/sec
...
etc.

-- 
Thanks,
Mina