linux-kernel - Re: [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <908b8b17-f942-f909-61e6-276df52a5ad5@huawei.com>
Date:   Fri, 16 Jun 2023 20:20:55 +0800
From:   Yunsheng Lin <linyunsheng@...wei.com>
To:     Alexander Duyck <alexander.duyck@...il.com>,
        Jakub Kicinski <kuba@...nel.org>
CC:     <davem@...emloft.net>, <pabeni@...hat.com>,
        <netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        Lorenzo Bianconi <lorenzo@...nel.org>,
        Yisen Zhuang <yisen.zhuang@...wei.com>,
        Salil Mehta <salil.mehta@...wei.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Sunil Goutham <sgoutham@...vell.com>,
        Geetha sowjanya <gakula@...vell.com>,
        Subbaraya Sundeep <sbhatta@...vell.com>,
        hariprasad <hkelam@...vell.com>,
        Saeed Mahameed <saeedm@...dia.com>,
        Leon Romanovsky <leon@...nel.org>,
        Felix Fietkau <nbd@....name>,
        Ryder Lee <ryder.lee@...iatek.com>,
        Shayne Chen <shayne.chen@...iatek.com>,
        Sean Wang <sean.wang@...iatek.com>,
        Kalle Valo <kvalo@...nel.org>,
        Matthias Brugger <matthias.bgg@...il.com>,
        AngeloGioacchino Del Regno 
        <angelogioacchino.delregno@...labora.com>,
        Jesper Dangaard Brouer <hawk@...nel.org>,
        Ilias Apalodimas <ilias.apalodimas@...aro.org>,
        <linux-rdma@...r.kernel.org>, <linux-wireless@...r.kernel.org>,
        <linux-arm-kernel@...ts.infradead.org>,
        <linux-mediatek@...ts.infradead.org>
Subject: Re: [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE_FRAG flag

On 2023/6/16 2:26, Alexander Duyck wrote:
> On Thu, Jun 15, 2023 at 9:51 AM Jakub Kicinski <kuba@...nel.org> wrote:
>>
>> On Thu, 15 Jun 2023 15:17:39 +0800 Yunsheng Lin wrote:
>>>> Does hns3_page_order() set a good example for the users?
>>>>
>>>> static inline unsigned int hns3_page_order(struct hns3_enet_ring *ring)
>>>> {
>>>> #if (PAGE_SIZE < 8192)
>>>>     if (ring->buf_size > (PAGE_SIZE / 2))
>>>>             return 1;
>>>> #endif
>>>>     return 0;
>>>> }
>>>>
>>>> Why allocate order 1 pages for buffers which would fit in a single page?
>>>> I feel like this soft of heuristic should be built into the API itself.
>>>
>>> hns3 only support fixed buf size per desc by 512 byte, 1024 bytes, 2048 bytes
>>> 4096 bytes, see hns3_buf_size2type(), I think the order 1 pages is for buf size
>>> with 4096 bytes and system page size with 4K, as hns3 driver still support the
>>> per-desc ping-pong way of page splitting when page_pool_enabled is false.
>>>
>>> With page pool enabled, you are right that order 0 pages is enough, and I am not
>>> sure about the exact reason we use the some order as the ping-pong way of page
>>> splitting now.
>>> As 2048 bytes buf size seems to be the default one, and I has not heard any one
>>> changing it. Also, it caculates the pool_size using something as below, so the
>>> memory usage is almost the same for order 0 and order 1:
>>>
>>> .pool_size = ring->desc_num * hns3_buf_size(ring) /
>>>               (PAGE_SIZE << hns3_page_order(ring)),
>>>
>>> I am not sure it worth changing it, maybe just change it to set good example for
>>> the users:) anyway I need to discuss this with other colleague internally and do
>>> some testing before doing the change.
>>
>> Right, I think this may be a leftover from the page flipping mode of
>> operation. But AFAIU we should leave the recycling fully to the page
>> pool now. If we make any improvements try to make them at the page pool
>> level.

I checked, the per-desc buf with 4096 bytes for hnse does not seem to
be used mainly because of the larger memory usage you mentioned below.

>>
>> I like your patches as they isolate the drivers from having to make the
>> fragmentation decisions based on the system page size (4k vs 64k but
>> we're hearing more and more about ARM w/ 16k pages). For that use case
>> this is great.

Yes, That is my point. For hw case, the page splitting in page pool is
mainly to enble multi-descs to use the same page as my understanding.

>>
>> What we don't want is drivers to start requesting larger page sizes
>> because it looks good in iperf on a freshly booted, idle system :(
> 
> Actually that would be a really good direction for this patch set to
> look at going into. Rather than having us always allocate a "page" it
> would make sense for most drivers to allocate a 4K fragment or the
> like in the case that the base page size is larger than 4K. That might
> be a good use case to justify doing away with the standard page pool
> page and look at making them all fragmented.

I am not sure if I understand the above, isn't the frag API able to
support allocating a 4K fragment when base page size is larger than
4K before or after this patch? what more do we need to do?

> 
> In the case of the standard page size being 4K a standard page would
> just have to take on the CPU overhead of the atomic_set and
> atomic_read for pp_ref_count (new name) which should be minimal as on
> most sane systems those just end up being a memory write and read.

If I understand you correctly, I think what you are trying to do
may break some of Jesper' benchmarking:)

[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_simple.c

> .
>