linux-kernel - Re: [PATCH net-next] net: page_pool: optimize page pool page allocation in NUMA scenario

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7b004f0d-7a15-6e01-ffa3-ed1826ba9810@huawei.com>
Date:   Mon, 27 Jun 2022 21:04:01 +0800
From:   "wangjie (L)" <wangjie125@...wei.com>
To:     Jesper Dangaard Brouer <jbrouer@...hat.com>,
        Guangbin Huang <huangguangbin2@...wei.com>, <hawk@...nel.org>,
        <ilias.apalodimas@...aro.org>, <davem@...emloft.net>,
        <kuba@...nel.org>, <edumazet@...gle.com>, <pabeni@...hat.com>
CC:     <brouer@...hat.com>, <lorenzo@...nel.org>,
        <netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <bpf@...r.kernel.org>, <lipeng321@...wei.com>,
        <chenhao288@...ilicon.com>
Subject: Re: [PATCH net-next] net: page_pool: optimize page pool page
 allocation in NUMA scenario



On 2022/6/27 17:50, Jesper Dangaard Brouer wrote:
>
>
> On 24/06/2022 11.36, Guangbin Huang wrote:
>> From: Jie Wang <wangjie125@...wei.com>
>>
>> Currently NIC packet receiving performance based on page pool
>> deteriorates
>> occasionally. To analysis the causes of this problem page allocation
>> stats
>> are collected. Here are the stats when NIC rx performance deteriorates:
>>
>> bandwidth(Gbits/s)        16.8        6.91
>> rx_pp_alloc_fast        13794308    21141869
>> rx_pp_alloc_slow        108625        166481
>> rx_pp_alloc_slow_h        0        0
>> rx_pp_alloc_empty        8192        8192
>> rx_pp_alloc_refill        0        0
>> rx_pp_alloc_waive        100433        158289
>> rx_pp_recycle_cached        0        0
>> rx_pp_recycle_cache_full    0        0
>> rx_pp_recycle_ring        362400        420281
>> rx_pp_recycle_ring_full        6064893        9709724
>> rx_pp_recycle_released_ref    0        0
>>
>> The rx_pp_alloc_waive count indicates that a large number of pages' numa
>> node are inconsistent with the NIC device numa node. Therefore these
>> pages
>> can't be reused by the page pool. As a result, many new pages would be
>> allocated by __page_pool_alloc_pages_slow which is time consuming. This
>> causes the NIC rx performance fluctuations.
>>
>> The main reason of huge numa mismatch pages in page pool is that page
>> pool
>> uses alloc_pages_bulk_array to allocate original pages. This function is
>> not suitable for page allocation in NUMA scenario. So this patch uses
>> alloc_pages_bulk_array_node which has a NUMA id input parameter to ensure
>> the NUMA consistent between NIC device and allocated pages.
>>
>> Repeated NIC rx performance tests are performed 40 times. NIC rx
>> bandwidth
>> is higher and more stable compared to the datas above. Here are three
>> test
>> stats, the rx_pp_alloc_waive count is zero and rx_pp_alloc_slow which
>> indicates pages allocated from slow patch is relatively low.
>>
>> bandwidth(Gbits/s)        93        93.9        93.8
>> rx_pp_alloc_fast        60066264    61266386    60938254
>> rx_pp_alloc_slow        16512        16517        16539
>> rx_pp_alloc_slow_ho        0        0        0
>> rx_pp_alloc_empty        16512        16517        16539
>> rx_pp_alloc_refill        473841        481910        481585
>> rx_pp_alloc_waive        0        0        0
>> rx_pp_recycle_cached        0        0        0
>> rx_pp_recycle_cache_full    0        0        0
>> rx_pp_recycle_ring        29754145    30358243    30194023
>> rx_pp_recycle_ring_full        0        0        0
>> rx_pp_recycle_released_ref    0        0        0
>>
>> Signed-off-by: Jie Wang <wangjie125@...wei.com>
>> ---
>>   net/core/page_pool.c | 11 ++++++++++-
>>   1 file changed, 10 insertions(+), 1 deletion(-)
>
> Thanks for improving this, but we need some small adjustments below.
> And then you need to send a V2 of the patch.
>
>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
>> index f18e6e771993..15997fcd78f3 100644
>> --- a/net/core/page_pool.c
>> +++ b/net/core/page_pool.c
>> @@ -377,6 +377,7 @@ static struct page
>> *__page_pool_alloc_pages_slow(struct page_pool *pool,
>>       unsigned int pp_order = pool->p.order;
>>       struct page *page;
>>       int i, nr_pages;
>> +    int pref_nid; /* preferred NUMA node */
>>         /* Don't support bulk alloc for high-order pages */
>>       if (unlikely(pp_order))
>> @@ -386,10 +387,18 @@ static struct page
>> *__page_pool_alloc_pages_slow(struct page_pool *pool,
>>       if (unlikely(pool->alloc.count > 0))
>>           return pool->alloc.cache[--pool->alloc.count];
>>   +#ifdef CONFIG_NUMA
>> +    pref_nid = (pool->p.nid == NUMA_NO_NODE) ? numa_mem_id() :
>> pool->p.nid;
>> +#else
>> +    /* Ignore pool->p.nid setting if !CONFIG_NUMA, helps compiler */
>
> Remove "helps compiler" from comments, it only make sense in the code
> this was copy-pasted from.
>
>
>> +    pref_nid = numa_mem_id(); /* will be zero like page_to_nid() */
>
> The comment about "page_to_nid()" is only relevant in the code
> this was copy-pasted from.
>
> Change to:
>     pref_nid = NUMA_NO_NODE;
>
> As alloc_pages_bulk_array_node() will be inlined, the effect (generated
> asm code) will be the same, but it will be better for code maintenance.
>
OK，thanks for your review, I will fix it in next version.
>> +#endif
>> +
>>       /* Mark empty alloc.cache slots "empty" for
>> alloc_pages_bulk_array */
>>       memset(&pool->alloc.cache, 0, sizeof(void *) * bulk);
>>   -    nr_pages = alloc_pages_bulk_array(gfp, bulk, pool->alloc.cache);
>> +    nr_pages = alloc_pages_bulk_array_node(gfp, pref_nid, bulk,
>> +                           pool->alloc.cache);
>>       if (unlikely(!nr_pages))
>>           return NULL;
>>
>
>
> .
>