[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a6c43e98662b24574cb1e996fcccad196b9e2bcd.camel@mellanox.com>
Date: Wed, 11 Dec 2019 20:57:16 +0000
From: Saeed Mahameed <saeedm@...lanox.com>
To: "linyunsheng@...wei.com" <linyunsheng@...wei.com>,
"brouer@...hat.com" <brouer@...hat.com>
CC: "jonathan.lemon@...il.com" <jonathan.lemon@...il.com>,
"ilias.apalodimas@...aro.org" <ilias.apalodimas@...aro.org>,
Li Rongqing <lirongqing@...du.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [PATCH][v2] page_pool: handle page recycle for NUMA_NO_NODE
condition
On Wed, 2019-12-11 at 11:01 +0800, Yunsheng Lin wrote:
> On 2019/12/11 3:45, Saeed Mahameed wrote:
> > > > maybe assume that __page_pool_recycle_direct() is always called
> > > > from
> > > > the right node and change the current bogus check:
> > > >
> > > > from:
> > > > page_to_nid(page) == pool->p.nid
> > > >
> > > > to:
> > > > page_to_nid(page) == numa_mem_id()
> > > >
> > > > This will allow recycling only if handling node is the same as
> > > > where
> > > > the page was allocated, regardless of pool->p.nid.
> > > >
> > > > so semantics are:
> > > >
> > > > 1) allocate from: pool->p.nid, as chosen by user.
> > > > 2) recycle when: page_to_nid(page) == numa_mem_id().
> > > > 3) pool user must guarantee that the handler will run on the
> > > > right
> > > > node. which should always be the case. otherwise recycling will
> > > > be
> > > > skipped (no cross numa recycling).
> > > >
> > > >
> > > > a) if the pool migrates, we will stop recycling until the pool
> > > > moves
> > > > back to original node, or user calls pool_update_nid() as we do
> > > > in
> > > > mlx5.
> > > > b) if pool is NUMA_NO_NODE, then allocation and handling will
> > > > be
> > > > done
> > > > on numa_mem_id(), which means the above check will work.
> > >
> > > Only checking page_to_nid(page) == numa_mem_id() may not work for
> > > the
> > > below
> > > case in mvneta.c:
> > >
> > > static int mvneta_create_page_pool(struct mvneta_port *pp,
> > > struct mvneta_rx_queue *rxq, int
> > > size)
> > > {
> > > struct bpf_prog *xdp_prog = READ_ONCE(pp->xdp_prog);
> > > struct page_pool_params pp_params = {
> > > .order = 0,
> > > .flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
> > > .pool_size = size,
> > > .nid = cpu_to_node(0),
> > > .dev = pp->dev->dev.parent,
> > > .dma_dir = xdp_prog ? DMA_BIDIRECTIONAL :
> > > DMA_FROM_DEVICE,
> > > .offset = pp->rx_offset_correction,
> > > .max_len = MVNETA_MAX_RX_BUF_SIZE,
> > > };
> > >
> > > the pool->p.nid is not NUMA_NO_NODE, then the node of page
> > > allocated
> > > for rx
> > > may not be numa_mem_id() when running in the NAPI polling,
> > > because
> > > pool->p.nid
> > > is not the same as the node of cpu running in the NAPI polling.
> > >
> > > Does the page pool support recycling for above case?
> > >
> >
> > I don't think you want to allow cross numa recycling.
>
> Cross numa recycling is not what I want.
>
> > > Or we "fix' the above case by setting pool->p.nid to
> > > NUMA_NO_NODE/dev_to_node(),
> > > or by calling pool_update_nid() in NAPI polling as mlx5 does?
> > >
> >
> > Yes just update_nid when needed, and make sure the NAPI polling
> > runs on
> > a consistent core and eventually alloc/recycling will happen on the
> > same core.
>
> To me, passing NUMA_NO_NODE/dev_to_node() seems to always work.
> Calling pool_update_nid() in NAPI polling is another way of passing
> NUMA_NO_NODE to page_pool_init().
>
> And it seems it is a copy & paste problem for mvneta and netsec
> driver that uses cpu_to_node(0) as pool->p.nid but does not call
> page_pool_nid_changed() in the NAPI polling as mlx5 does.
>
> So I suggest to remove page_pool_nid_changed() and always use
> NUMA_NO_NODE/dev_to_node() as pool->p.nid or make it clear (
> by comment or warning?)that page_pool_nid_changed() should be
> called when pool->p.nid is NUMA_NO_NODE/dev_to_node().
>
not an option.
rx rings should always allocate data buffers according to their cpu
affinity and not dev_node or default to NUMA_NO_NODE.
> I prefer to remove page_pool_nid_changed() if we do not allow
> cross numa recycling.
>
This is not for cross numa recycling.
Since rx rings can migragte between cores, (msix affinity/IRQ balancer)
we need page_pool_nid_changed() for seamless migration and for
recycling and allocation to migrate with the ring.
>
> > > > Thanks,
> > > > Saeed.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
Powered by blists - more mailing lists