[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <729b360c-4d79-1025-f5be-384b17f132d3@redhat.com>
Date: Tue, 25 Jul 2023 20:01:03 +0200
From: Jesper Dangaard Brouer <jbrouer@...hat.com>
To: Haiyang Zhang <haiyangz@...rosoft.com>,
Jesper Dangaard Brouer <jbrouer@...hat.com>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Cc: brouer@...hat.com, Dexuan Cui <decui@...rosoft.com>,
KY Srinivasan <kys@...rosoft.com>,
Paul Rosswurm <paulros@...rosoft.com>,
"olaf@...fle.de" <olaf@...fle.de>,
"vkuznets@...hat.com" <vkuznets@...hat.com>,
"davem@...emloft.net" <davem@...emloft.net>,
"wei.liu@...nel.org" <wei.liu@...nel.org>,
"edumazet@...gle.com" <edumazet@...gle.com>,
"kuba@...nel.org" <kuba@...nel.org>,
"pabeni@...hat.com" <pabeni@...hat.com>,
"leon@...nel.org" <leon@...nel.org>,
Long Li <longli@...rosoft.com>,
"ssengar@...ux.microsoft.com" <ssengar@...ux.microsoft.com>,
"linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
"daniel@...earbox.net" <daniel@...earbox.net>,
"john.fastabend@...il.com" <john.fastabend@...il.com>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>,
"ast@...nel.org" <ast@...nel.org>,
Ajay Sharma <sharmaajay@...rosoft.com>,
"hawk@...nel.org" <hawk@...nel.org>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"shradhagupta@...ux.microsoft.com" <shradhagupta@...ux.microsoft.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Ilias Apalodimas <ilias.apalodimas@...aro.org>
Subject: Re: [PATCH V3,net-next] net: mana: Add page pool for RX buffers
On 24/07/2023 20.35, Haiyang Zhang wrote:
>
[...]
>>> On 21/07/2023 21.05, Haiyang Zhang wrote:
>>>> Add page pool for RX buffers for faster buffer cycle and reduce CPU
>>>> usage.
>>>>
>>>> The standard page pool API is used.
>>>>
>>>> Signed-off-by: Haiyang Zhang <haiyangz@...rosoft.com>
>>>> ---
>>>> V3:
>>>> Update xdp mem model, pool param, alloc as suggested by Jakub Kicinski
>>>> V2:
>>>> Use the standard page pool API as suggested by Jesper Dangaard Brouer
>>>>
>>>> ---
>>>> drivers/net/ethernet/microsoft/mana/mana_en.c | 91 +++++++++++++++--
>> --
>>>> include/net/mana/mana.h | 3 +
>>>> 2 files changed, 78 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
>>> b/drivers/net/ethernet/microsoft/mana/mana_en.c
>>>> index a499e460594b..4307f25f8c7a 100644
>>>> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
>>>> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
>>> [...]
>>>> @@ -1659,6 +1679,8 @@ static void mana_poll_rx_cq(struct mana_cq *cq)
>>>>
>>>> if (rxq->xdp_flush)
>>>> xdp_do_flush();
>>>> +
>>>> + page_pool_nid_changed(rxq->page_pool, numa_mem_id());
>>>
>>> I don't think this page_pool_nid_changed() called is needed, if you do
>>> as I suggest below (nid = NUMA_NO_NODE).
>>>
>>>
>>>> }
>>>>
>>>> static int mana_cq_handler(void *context, struct gdma_queue
>>> *gdma_queue)
>>> [...]
>>>
>>>> @@ -2008,6 +2041,25 @@ static int mana_push_wqe(struct mana_rxq
>> *rxq)
>>>> return 0;
>>>> }
>>>>
>>>> +static int mana_create_page_pool(struct mana_rxq *rxq)
>>>> +{
>>>> + struct page_pool_params pprm = {};
>>>
>>> You are implicitly assigning NUMA node id zero.
>>>
>>>> + int ret;
>>>> +
>>>> + pprm.pool_size = RX_BUFFERS_PER_QUEUE;
>>>> + pprm.napi = &rxq->rx_cq.napi;
>>>
>>> You likely want to assign pprm.nid to NUMA_NO_NODE
>>>
>>> pprm.nid = NUMA_NO_NODE;
>>>
>>> For most drivers it is recommended to assign ``NUMA_NO_NODE`` (value -1)
>>> as the NUMA ID to ``pp_params.nid``. When ``CONFIG_NUMA`` is enabled
>>> this setting will automatically select the (preferred) NUMA node (via
>>> ``numa_mem_id()``) based on where NAPI RX-processing is currently
>>> running. The effect is that page_pool will only use recycled memory when
>>> NUMA node match running CPU. This assumes CPU refilling driver RX-ring
>>> will also run RX-NAPI.
>>>
>>> If a driver want more control over the NUMA node memory selection,
>>> drivers can assign (``pp_params.nid``) something else than
>>> `NUMA_NO_NODE`` and runtime adjust via function
>>> ``page_pool_nid_changed()``.
>>
>> Our driver is using NUMA 0 by default, so I implicitly assign NUMA node id
>> to zero during pool init.
>>
>> And, if the IRQ/CPU affinity is changed, the page_pool_nid_changed()
>> will update the nid for the pool. Does this sound good?
>>
>
> Also, since our driver is getting the default node from here:
> gc->numa_node = dev_to_node(&pdev->dev);
> I will update this patch to set the default node as above, instead of implicitly
> assigning it to 0.
>
In that case, I agree that it make sense to use dev_to_node(&pdev->dev),
like:
pprm.nid = dev_to_node(&pdev->dev);
Driver must have a reason for assigning gc->numa_node for this hardware,
which is okay. That is why page_pool API allows driver to control this.
But then I don't think you should call page_pool_nid_changed() like
page_pool_nid_changed(rxq->page_pool, numa_mem_id());
Because then you will (at first packet processing event) revert the
dev_to_node() setting to use numa_mem_id() of processing/running CPU.
(In effect this will be the same as setting NUMA_NO_NODE).
I know, mlx5 do call page_pool_nid_changed(), but they showed benchmark
numbers that this was preferred action, even-when sysadm had
"misconfigured" the default smp_affinity RX-processing to happen on a
remote NUMA node. AFAIK mlx5 keeps the descriptor rings on the
originally configured NUMA node that corresponds to the NIC PCIe slot.
--Jesper
Powered by blists - more mailing lists