netdev - Re: [PATCH net-next v10 3/4] page_pool: support unlimited number of inflight pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <04c85f03-d3b3-4299-90d8-1d8432925993@huawei.com>
Date: Wed, 5 Mar 2025 20:19:31 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: Jesper Dangaard Brouer <hawk@...nel.org>, <davem@...emloft.net>,
	<kuba@...nel.org>, <pabeni@...hat.com>
CC: <zhangkun09@...wei.com>, <liuyonglong@...wei.com>,
	<fanghaiqing@...wei.com>, Robin Murphy <robin.murphy@....com>, Alexander
 Duyck <alexander.duyck@...il.com>, IOMMU <iommu@...ts.linux.dev>, Eric
 Dumazet <edumazet@...gle.com>, Simon Horman <horms@...nel.org>, Donald Hunter
	<donald.hunter@...il.com>, Ilias Apalodimas <ilias.apalodimas@...aro.org>,
	Andrew Lunn <andrew+netdev@...n.ch>, <netdev@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, kernel-team <kernel-team@...udflare.com>, Yan
 Zhai <yan@...udflare.com>
Subject: Re: [PATCH net-next v10 3/4] page_pool: support unlimited number of
 inflight pages

On 2025/3/4 17:25, Jesper Dangaard Brouer wrote:


>>   }
>> @@ -514,10 +518,42 @@ static struct page_pool_item *page_pool_fast_item_alloc(struct page_pool *pool)
>>       return llist_entry(first, struct page_pool_item, lentry);
>>   }
>>   +static struct page_pool_item *page_pool_slow_item_alloc(struct page_pool *pool)
>> +{
>> +    if (unlikely(!pool->slow_items.block ||
>> +             pool->slow_items.next_to_use >= ITEMS_PER_PAGE)) {
>> +        struct page_pool_item_block *block;
>> +        struct page *page;
>> +
>> +        page = alloc_pages_node(pool->p.nid, GFP_ATOMIC | __GFP_NOWARN |
>> +                    __GFP_ZERO, 0);
>> +        if (!page) {
>> +            alloc_stat_inc(pool, item_slow_failed);
>> +            return NULL;
>> +        }
> 
> I'm missing a counter that I can use to monitor the rate of page
> allocations for these "item" block's.
> In production want to have a metric that shows me a sudden influx of
> that cause code to hit this "item_slow_alloc" case (inflight_slow_alloc)

It seems the 'item_fast_empty' stat added in patch 2 is the metric you
mention above? as we use those slow_items sequentially, the pages allocated
for slow_items can be calculated by 'item_fast_empty' / ITEMS_PER_PAGE.

> 
> BTW should this be called "inflight_block" instead of "item_block"?
> 
> 
>> +
>> +        block = page_address(page);
>> +        block->pp = pool;
>> +        block->flags |= PAGE_POOL_SLOW_ITEM_BLOCK_BIT;
>> +        refcount_set(&block->ref, ITEMS_PER_PAGE);
>> +        pool->slow_items.block = block;
>> +        pool->slow_items.next_to_use = 0;
>> +
>> +        spin_lock_bh(&pool->item_lock);
>> +        list_add(&block->list, &pool->item_blocks);
>> +        spin_unlock_bh(&pool->item_lock);
>> +    }
>> +
>> +    return &pool->slow_items.block->items[pool->slow_items.next_to_use++];
>> +}

...

>>   +static void __page_pool_slow_item_free(struct page_pool *pool,
>> +                       struct page_pool_item_block *block)
>> +{
>> +    spin_lock_bh(&pool->item_lock);
>> +    list_del(&block->list);
>> +    spin_unlock_bh(&pool->item_lock);
>> +
>> +    put_page(virt_to_page(block));
> 
> Here again I'm missing a counter that I can use to monitor the rate of
> page free events.
> 
> In production I want a metric (e.g inflight_slow_free_block) that
> together with "item_slow_alloc" (perhaps named
> inflight_slow_alloc_block), show me if this code path is creating churn,
> that I can correlate/explain some other influx event on the system.
> 
> BTW subtracting these (alloc - free) counters gives us the memory used.

If I understand it correctly, the 'item_fast_empty' is something like
the 'alloc' mentioned above, let's discuss the 'free' below.

> 
>> +}
>> +

...

>>   }
>>   +static int page_pool_nl_fill_item_mem_info(struct page_pool *pool,
>> +                       struct sk_buff *rsp)
>> +{
>> +    struct page_pool_item_block *block;
>> +    size_t resident = 0, used = 0;
>> +    int err;
>> +
>> +    spin_lock_bh(&pool->item_lock);
>> +
>> +    list_for_each_entry(block, &pool->item_blocks, list) {
>> +        resident += PAGE_SIZE;
>> +
>> +        if (block->flags & PAGE_POOL_SLOW_ITEM_BLOCK_BIT)
>> +            used += (PAGE_SIZE - sizeof(struct page_pool_item) *
>> +                 refcount_read(&block->ref));
>> +        else
>> +            used += PAGE_SIZE;
>> +    }
>> +
>> +    spin_unlock_bh(&pool->item_lock);
> 
> Holding a BH spin_lock can easily create production issues.

The above is not only give us the total pages used for page_pool_item,
but also give us the fragmentation info for those pages too.
So it seems the BH spin_lock is needed if we want the fragmentation info?

And the 'free' memtioned above can be calculated by 'memory used' - 'alloc'.

> I worry how long time it will take to traverse these lists.

I wouldn't worry about that as it is not supposed to be a lot of pages
in those list, if it is, it seems it is something we should be fixing
by increasing the size of fast_item or by defragmenting the slow_item
pages.

> 
> We (Cc Yan) are currently hunting down a number of real production issue
> due to different cases of control-path code querying the kernel that
> takes a _bh lock to read data, hurting the data-path processing.

I am not sure if the above is the control-path here, I would rather
treat it as the debug-path?

> 
> If we had the stats counters, then this would be less work, right?

It depends on if we want the fragmentation info or not as mentioned
above.

> 
>