netdev - Re: [PATCH][v2] page_pool: handle page recycle for NUMA_NO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a6c43e98662b24574cb1e996fcccad196b9e2bcd.camel@mellanox.com>
Date:   Wed, 11 Dec 2019 20:57:16 +0000
From:   Saeed Mahameed <saeedm@...lanox.com>
To:     "linyunsheng@...wei.com" <linyunsheng@...wei.com>,
        "brouer@...hat.com" <brouer@...hat.com>
CC:     "jonathan.lemon@...il.com" <jonathan.lemon@...il.com>,
        "ilias.apalodimas@...aro.org" <ilias.apalodimas@...aro.org>,
        Li Rongqing <lirongqing@...du.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [PATCH][v2] page_pool: handle page recycle for NUMA_NO_NODE
 condition

On Wed, 2019-12-11 at 11:01 +0800, Yunsheng Lin wrote:
> On 2019/12/11 3:45, Saeed Mahameed wrote:
> > > > maybe assume that __page_pool_recycle_direct() is always called
> > > > from
> > > > the right node and change the current bogus check:
> > > > 
> > > > from:
> > > > page_to_nid(page) == pool->p.nid 
> > > > 
> > > > to:
> > > > page_to_nid(page) == numa_mem_id()
> > > > 
> > > > This will allow recycling only if handling node is the same as
> > > > where
> > > > the page was allocated, regardless of pool->p.nid.
> > > > 
> > > > so semantics are:
> > > > 
> > > > 1) allocate from: pool->p.nid, as chosen by user.
> > > > 2) recycle when: page_to_nid(page) == numa_mem_id().
> > > > 3) pool user must guarantee that the handler will run on the
> > > > right
> > > > node. which should always be the case. otherwise recycling will
> > > > be
> > > > skipped (no cross numa recycling).
> > > > 
> > > > 
> > > > a) if the pool migrates, we will stop recycling until the pool
> > > > moves
> > > > back to original node, or user calls pool_update_nid() as we do
> > > > in
> > > > mlx5.
> > > > b) if pool is NUMA_NO_NODE, then allocation and handling will
> > > > be
> > > > done
> > > > on numa_mem_id(), which means the above check will work.
> > > 
> > > Only checking page_to_nid(page) == numa_mem_id() may not work for
> > > the
> > > below
> > > case in mvneta.c:
> > > 
> > > static int mvneta_create_page_pool(struct mvneta_port *pp,
> > > 				   struct mvneta_rx_queue *rxq, int
> > > size)
> > > {
> > > 	struct bpf_prog *xdp_prog = READ_ONCE(pp->xdp_prog);
> > > 	struct page_pool_params pp_params = {
> > > 		.order = 0,
> > > 		.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
> > > 		.pool_size = size,
> > > 		.nid = cpu_to_node(0),
> > > 		.dev = pp->dev->dev.parent,
> > > 		.dma_dir = xdp_prog ? DMA_BIDIRECTIONAL :
> > > DMA_FROM_DEVICE,
> > > 		.offset = pp->rx_offset_correction,
> > > 		.max_len = MVNETA_MAX_RX_BUF_SIZE,
> > > 	};
> > > 
> > > the pool->p.nid is not NUMA_NO_NODE, then the node of page
> > > allocated
> > > for rx
> > > may not be numa_mem_id() when running in the NAPI polling,
> > > because
> > > pool->p.nid
> > > is not the same as the node of cpu running in the NAPI polling.
> > > 
> > > Does the page pool support recycling for above case?
> > > 
> > 
> > I don't think you want to allow cross numa recycling.
> 
> Cross numa recycling is not what I want.
> 
> > > Or we "fix' the above case by setting pool->p.nid to
> > > NUMA_NO_NODE/dev_to_node(),
> > > or by calling pool_update_nid() in NAPI polling as mlx5 does?
> > > 
> > 
> > Yes just update_nid when needed, and make sure the NAPI polling
> > runs on
> > a consistent core and eventually alloc/recycling will happen on the
> > same core.
> 
> To me, passing NUMA_NO_NODE/dev_to_node() seems to always work.
> Calling pool_update_nid() in NAPI polling is another way of passing
> NUMA_NO_NODE to page_pool_init().
> 
> And it seems it is a copy & paste problem for mvneta and netsec
> driver that uses cpu_to_node(0) as pool->p.nid but does not call
> page_pool_nid_changed() in the NAPI polling as mlx5 does.
> 
> So I suggest to remove page_pool_nid_changed() and always use
> NUMA_NO_NODE/dev_to_node() as pool->p.nid or make it clear (
> by comment or warning?)that page_pool_nid_changed() should be
> called when pool->p.nid is NUMA_NO_NODE/dev_to_node().
> 

not an option.

rx rings should always allocate data buffers according to their cpu
affinity and not dev_node or default to NUMA_NO_NODE.

> I prefer to remove page_pool_nid_changed() if we do not allow
> cross numa recycling.
> 

This is not for cross numa recycling. 
Since rx rings can migragte between cores, (msix affinity/IRQ balancer)
we need page_pool_nid_changed() for seamless migration and for
recycling and allocation to migrate with the ring.

> 
> > > > Thanks,
> > > > Saeed.
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > >