[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHS8izOY9xm=LBEN8sYwEa3aFB4GWDvJVacom3o4mHZPdHzTUg@mail.gmail.com>
Date: Thu, 14 Dec 2023 12:03:15 -0800
From: Mina Almasry <almasrymina@...gle.com>
To: Pavel Begunkov <asml.silence@...il.com>
Cc: Shailend Chand <shailend@...gle.com>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
linux-arch@...r.kernel.org, linux-kselftest@...r.kernel.org,
bpf@...r.kernel.org, linux-media@...r.kernel.org,
dri-devel@...ts.freedesktop.org, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Jonathan Corbet <corbet@....net>, Jeroen de Borst <jeroendb@...gle.com>,
Praveen Kaligineedi <pkaligineedi@...gle.com>, Jesper Dangaard Brouer <hawk@...nel.org>,
Ilias Apalodimas <ilias.apalodimas@...aro.org>, Arnd Bergmann <arnd@...db.de>,
David Ahern <dsahern@...nel.org>, Willem de Bruijn <willemdebruijn.kernel@...il.com>,
Shuah Khan <shuah@...nel.org>, Sumit Semwal <sumit.semwal@...aro.org>,
Christian König <christian.koenig@....com>,
Yunsheng Lin <linyunsheng@...wei.com>, Harshitha Ramamurthy <hramamurthy@...gle.com>,
Shakeel Butt <shakeelb@...gle.com>, Willem de Bruijn <willemb@...gle.com>,
Kaiyuan Zhang <kaiyuanz@...gle.com>
Subject: Re: [net-next v1 08/16] memory-provider: dmabuf devmem memory provider
On Mon, Dec 11, 2023 at 12:37 PM Pavel Begunkov <asml.silence@...il.com> wrote:
...
> >> If you remove the branch, let it fall into ->release and rely
> >> on refcounting there, then the callback could also fix up
> >> release_cnt or ask pp to do it, like in the patch I linked above
> >>
> >
> > Sadly I don't think this is possible due to the reasons I mention in
> > the commit message of that patch. Prematurely releasing ppiov and not
> > having them be candidates for recycling shows me a 4-5x degradation in
> > performance.
>
> I don't think I follow. The concept is to only recycle a buffer (i.e.
> make it available for allocation) when its refs drop to zero, which is
> IMHO the only way it can work, and IIUC what this patchset is doing.
>
> That's also I suggest to do, but through a slightly different path.
> Let's say at some moment there are 2 refs (e.g. 1 for an skb and
> 1 for userspace/xarray).
>
> Say it first puts the skb:
>
> napi_pp_put_page()
> -> page_pool_return_page()
> -> mp_ops->release_page()
> -> need_to_free = put_buf()
> // not last ref, need_to_free==false,
> // don't recycle, don't increase release_cnt
>
> Then you put the last ref:
>
> page_pool_iov_put_many()
> -> page_pool_return_page()
> -> mp_ops->release_page()
> -> need_to_free = put_buf()
> // last ref, need_to_free==true,
> // recycle and release_cnt++
>
> And that last put can even be recycled right into the
> pp / ptr_ring, in which case it doesn't need to touch
> release_cnt. Does it make sense? I don't see where
> 4-5x degradation would come from
>
>
Sorry for the late reply, I have been working on this locally.
What you're saying makes sense, and I'm no longer sure why I was
seeing a perf degradation without '[net-next v1 10/16] page_pool:
don't release iov on elevanted refcount'. However, even though what
you're saying is technically correct, AFAIU it's actually semantically
wrong. When a page is released by the page_pool, we should call
page_pool_clear_pp_info() and completely disconnect the page from the
pool. If we call release_page() on a page and then the page pool sees
it again in page_pool_return_page(), I think that is considered a bug.
In fact I think what you're proposing is as a result of a bug because
we don't call a page_pool_clear_pp_info() equivalent on releasing
ppiov.
However, I'm reasonably confident I figured out the right thing to do
here. The page_pool uses page->pp_frag_count for its refcounting.
pp_frag_count is a misnomer, it's being renamed to pp_ref_count in
Liang's series[1]). In this series I used a get_page/put_page
equivalent for refcounting. Once I transitioned to using
pp_[frag|ref]_count for refcounting inside the page_pool, the issue
went away, and I no longer need the patch 'page_pool: don't release
iov on elevanted refcount'.
There is an additional upside, since pages and ppiovs are both being
refcounted using pp_[frag|ref]_count, we get some unified handling for
ppiov and we reduce the checks around ppiov. This should be fixed
properly in the next series.
I still need to do some work (~1 week) before I upload the next
version as there is a new requirement from MM that we transition to a
new type and not re-use page*, but I uploaded my changes github with
the refcounting issues resolved in case they're useful to you. Sorry
for the churn:
https://github.com/mina/linux/commits/tcpdevmem-v1.5/
[1] https://patchwork.kernel.org/project/netdevbpf/list/?series=809049&state=*
--
Thanks,
Mina
Powered by blists - more mailing lists