[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHS8izOUi6qGp=LSQb_o5oph-EnhNOuhLkPSfbQRU3eniZvbdA@mail.gmail.com>
Date: Fri, 22 Mar 2024 10:54:54 -0700
From: Mina Almasry <almasrymina@...gle.com>
To: Christoph Hellwig <hch@...radead.org>
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-doc@...r.kernel.org, linux-alpha@...r.kernel.org,
linux-mips@...r.kernel.org, linux-parisc@...r.kernel.org,
sparclinux@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
linux-arch@...r.kernel.org, bpf@...r.kernel.org,
linux-kselftest@...r.kernel.org, linux-media@...r.kernel.org,
dri-devel@...ts.freedesktop.org, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Jonathan Corbet <corbet@....net>, Richard Henderson <richard.henderson@...aro.org>,
Ivan Kokshaysky <ink@...assic.park.msu.ru>, Matt Turner <mattst88@...il.com>,
Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
"James E.J. Bottomley" <James.Bottomley@...senpartnership.com>, Helge Deller <deller@....de>,
Andreas Larsson <andreas@...sler.com>, Jesper Dangaard Brouer <hawk@...nel.org>,
Ilias Apalodimas <ilias.apalodimas@...aro.org>, Steven Rostedt <rostedt@...dmis.org>,
Masami Hiramatsu <mhiramat@...nel.org>, Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Arnd Bergmann <arnd@...db.de>, Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>,
Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>,
Yonghong Song <yonghong.song@...ux.dev>, John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...gle.com>, Hao Luo <haoluo@...gle.com>,
Jiri Olsa <jolsa@...nel.org>, David Ahern <dsahern@...nel.org>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>, Shuah Khan <shuah@...nel.org>,
Sumit Semwal <sumit.semwal@...aro.org>, Christian König <christian.koenig@....com>,
Pavel Begunkov <asml.silence@...il.com>, David Wei <dw@...idwei.uk>, Jason Gunthorpe <jgg@...pe.ca>,
Yunsheng Lin <linyunsheng@...wei.com>, Shailend Chand <shailend@...gle.com>,
Harshitha Ramamurthy <hramamurthy@...gle.com>, Shakeel Butt <shakeelb@...gle.com>,
Jeroen de Borst <jeroendb@...gle.com>, Praveen Kaligineedi <pkaligineedi@...gle.com>
Subject: Re: [RFC PATCH net-next v6 02/15] net: page_pool: create hooks for
custom page providers
On Sun, Mar 17, 2024 at 7:03 PM Christoph Hellwig <hch@...radead.org> wrote:
>
> On Mon, Mar 04, 2024 at 06:01:37PM -0800, Mina Almasry wrote:
> > From: Jakub Kicinski <kuba@...nel.org>
> >
> > The page providers which try to reuse the same pages will
> > need to hold onto the ref, even if page gets released from
> > the pool - as in releasing the page from the pp just transfers
> > the "ownership" reference from pp to the provider, and provider
> > will wait for other references to be gone before feeding this
> > page back into the pool.
>
> The word hook always rings a giant warning bell for me, and looking into
> this series I am concerned indeed.
>
> The only provider provided here is the dma-buf one, and that basically
> is the only sensible one for the documented design.
Sorry I don't mean to argue but as David mentioned, there are some
plans in the works and ones not in the works to extend this to other
memory types. David mentioned io_uring & Jakub's huge page use cases
which may want to re-use this design. I have an additional one in
mind, which is extending devmem TCP for storage devices. Currently
storage devices do not support dmabuf and my understanding is that
it's very hard to do so, and NVMe uses pci_p2pdma instead. I wonder if
it's possible to extend devmem TCP in the future to support pci_p2pdma
to support nvme devices in the future.
Additionally I've been thinking about a use case of limiting the
amount of memory the net stack can use. Currently the page pool is
free to allocate as much memory as it wants from the buddy allocator.
This may be undesirable in very low memory setups such as overcommited
VMs. We can imagine a memory provider that allows allocation only if
the page_pool is below a certain limit. We can also imagine a memory
provider that preallocates memory and only uses that pinned pool. None
of these are in the works at the moment, but are examples of how this
can be (reasonably?) extended.
> So instead of
> adding hooks that random proprietary crap can hook into,
To be completely honest I'm unsure how to design hooks for proprietary
code to hook into. I think that would be done on the basis of
EXPORTED_SYMBOL? We do not export these hooks, nor plan to at the
moment.
> why not hard
> code the dma buf provide and just use a flag? That'll also avoid
> expensive indirect calls.
>
Thankfully the indirect calls do not seem to be an issue. We've been
able to hit 95% line rate with devmem TCP and I think the remaining 5%
are a bottleneck unrelated to the indirect calls. Page_pool benchmarks
show a very minor degradation in the fast path, so small it may be
just noise in the measurement (may!):
https://lore.kernel.org/netdev/20240305020153.2787423-1-almasrymina@google.com/T/#m1c308df9665724879947a345c4b1ec3b51ff6856
This is because the code path that does indirect allocations is the
slow path. The page_pool recycles netmem aggressively.
--
Thanks,
Mina
Powered by blists - more mailing lists