[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0Uc6Xoyh3Edgt+83b+HTM5j4JDr3fuxcyL9qDk+Wwt9APg@mail.gmail.com>
Date: Thu, 15 Jun 2023 11:26:44 -0700
From: Alexander Duyck <alexander.duyck@...il.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Yunsheng Lin <linyunsheng@...wei.com>, davem@...emloft.net, pabeni@...hat.com,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
Lorenzo Bianconi <lorenzo@...nel.org>, Yisen Zhuang <yisen.zhuang@...wei.com>,
Salil Mehta <salil.mehta@...wei.com>, Eric Dumazet <edumazet@...gle.com>,
Sunil Goutham <sgoutham@...vell.com>, Geetha sowjanya <gakula@...vell.com>,
Subbaraya Sundeep <sbhatta@...vell.com>, hariprasad <hkelam@...vell.com>,
Saeed Mahameed <saeedm@...dia.com>, Leon Romanovsky <leon@...nel.org>, Felix Fietkau <nbd@....name>,
Ryder Lee <ryder.lee@...iatek.com>, Shayne Chen <shayne.chen@...iatek.com>,
Sean Wang <sean.wang@...iatek.com>, Kalle Valo <kvalo@...nel.org>,
Matthias Brugger <matthias.bgg@...il.com>,
AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>,
Jesper Dangaard Brouer <hawk@...nel.org>, Ilias Apalodimas <ilias.apalodimas@...aro.org>,
linux-rdma@...r.kernel.org, linux-wireless@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, linux-mediatek@...ts.infradead.org
Subject: Re: [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE_FRAG flag
On Thu, Jun 15, 2023 at 9:51 AM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Thu, 15 Jun 2023 15:17:39 +0800 Yunsheng Lin wrote:
> > > Does hns3_page_order() set a good example for the users?
> > >
> > > static inline unsigned int hns3_page_order(struct hns3_enet_ring *ring)
> > > {
> > > #if (PAGE_SIZE < 8192)
> > > if (ring->buf_size > (PAGE_SIZE / 2))
> > > return 1;
> > > #endif
> > > return 0;
> > > }
> > >
> > > Why allocate order 1 pages for buffers which would fit in a single page?
> > > I feel like this soft of heuristic should be built into the API itself.
> >
> > hns3 only support fixed buf size per desc by 512 byte, 1024 bytes, 2048 bytes
> > 4096 bytes, see hns3_buf_size2type(), I think the order 1 pages is for buf size
> > with 4096 bytes and system page size with 4K, as hns3 driver still support the
> > per-desc ping-pong way of page splitting when page_pool_enabled is false.
> >
> > With page pool enabled, you are right that order 0 pages is enough, and I am not
> > sure about the exact reason we use the some order as the ping-pong way of page
> > splitting now.
> > As 2048 bytes buf size seems to be the default one, and I has not heard any one
> > changing it. Also, it caculates the pool_size using something as below, so the
> > memory usage is almost the same for order 0 and order 1:
> >
> > .pool_size = ring->desc_num * hns3_buf_size(ring) /
> > (PAGE_SIZE << hns3_page_order(ring)),
> >
> > I am not sure it worth changing it, maybe just change it to set good example for
> > the users:) anyway I need to discuss this with other colleague internally and do
> > some testing before doing the change.
>
> Right, I think this may be a leftover from the page flipping mode of
> operation. But AFAIU we should leave the recycling fully to the page
> pool now. If we make any improvements try to make them at the page pool
> level.
>
> I like your patches as they isolate the drivers from having to make the
> fragmentation decisions based on the system page size (4k vs 64k but
> we're hearing more and more about ARM w/ 16k pages). For that use case
> this is great.
>
> What we don't want is drivers to start requesting larger page sizes
> because it looks good in iperf on a freshly booted, idle system :(
Actually that would be a really good direction for this patch set to
look at going into. Rather than having us always allocate a "page" it
would make sense for most drivers to allocate a 4K fragment or the
like in the case that the base page size is larger than 4K. That might
be a good use case to justify doing away with the standard page pool
page and look at making them all fragmented.
In the case of the standard page size being 4K a standard page would
just have to take on the CPU overhead of the atomic_set and
atomic_read for pp_ref_count (new name) which should be minimal as on
most sane systems those just end up being a memory write and read.
Powered by blists - more mailing lists