[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3f2d88fdcb00b6cc2925d5a2fab38e50d43d8a52.camel@mellanox.com>
Date: Tue, 17 Dec 2019 19:38:44 +0000
From: Saeed Mahameed <saeedm@...lanox.com>
To: "ilias.apalodimas@...aro.org" <ilias.apalodimas@...aro.org>,
Li Rongqing <lirongqing@...du.com>
CC: "mhocko@...nel.org" <mhocko@...nel.org>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"peterz@...radead.org" <peterz@...radead.org>,
"linyunsheng@...wei.com" <linyunsheng@...wei.com>,
"jonathan.lemon@...il.com" <jonathan.lemon@...il.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"brouer@...hat.com" <brouer@...hat.com>,
"bhelgaas@...gle.com" <bhelgaas@...gle.com>,
"bjorn.topel@...el.com" <bjorn.topel@...el.com>
Subject: Re: 答复: [PATCH][v2] page_pool: handle page recycle for NUMA_NO_NODE condition
On Mon, 2019-12-16 at 12:13 +0200, Ilias Apalodimas wrote:
> On Mon, Dec 16, 2019 at 04:02:04AM +0000, Li,Rongqing wrote:
> >
> > > -----邮件原件-----
> > > 发件人: Yunsheng Lin [mailto:linyunsheng@...wei.com]
> > > 发送时间: 2019年12月16日 9:51
> > > 收件人: Jesper Dangaard Brouer <brouer@...hat.com>
> > > 抄送: Li,Rongqing <lirongqing@...du.com>; Saeed Mahameed
> > > <saeedm@...lanox.com>; ilias.apalodimas@...aro.org;
> > > jonathan.lemon@...il.com; netdev@...r.kernel.org;
> > > mhocko@...nel.org;
> > > peterz@...radead.org; Greg Kroah-Hartman <
> > > gregkh@...uxfoundation.org>;
> > > bhelgaas@...gle.com; linux-kernel@...r.kernel.org; Björn Töpel
> > > <bjorn.topel@...el.com>
> > > 主题: Re: [PATCH][v2] page_pool: handle page recycle for
> > > NUMA_NO_NODE
> > > condition
> > >
> > > On 2019/12/13 16:48, Jesper Dangaard Brouer wrote:> You are
> > > basically saying
> > > that the NUMA check should be moved to
> > > > allocation time, as it is running the RX-CPU (NAPI). And
> > > > eventually
> > > > after some time the pages will come from correct NUMA node.
> > > >
> > > > I think we can do that, and only affect the semi-fast-path.
> > > > We just need to handle that pages in the ptr_ring that are
> > > > recycled
> > > > can be from the wrong NUMA node. In __page_pool_get_cached()
> > > > when
> > > > consuming pages from the ptr_ring (__ptr_ring_consume_batched),
> > > > then
> > > > we can evict pages from wrong NUMA node.
> > >
> > > Yes, that's workable.
> > >
> > > > For the pool->alloc.cache we either accept, that it will
> > > > eventually
> > > > after some time be emptied (it is only in a 100% XDP_DROP
> > > > workload that
> > > > it will continue to reuse same pages). Or we simply clear the
> > > > pool->alloc.cache when calling page_pool_update_nid().
> > >
> > > Simply clearing the pool->alloc.cache when calling
> > > page_pool_update_nid()
> > > seems better.
> > >
> >
> > How about the below codes, the driver can configure p.nid to any,
> > which will be adjusted in NAPI polling, irq migration will not be
> > problem, but it will add a check into hot path.
>
> We'll have to check the impact on some high speed (i.e 100gbit)
> interface
> between doing anything like that. Saeed's current patch runs once per
> NAPI. This
> runs once per packet. The load might be measurable.
> The READ_ONCE is needed in case all producers/consumers run on the
> same CPU
> right?
>
I agree with Illias, and as i explained this will make the pool biased
to cpu close only, and we want to avoid this,
Li, can you please check if this fixes your issue:
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index a6aefe989043..00c99282a306 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -28,6 +28,9 @@ static int page_pool_init(struct page_pool *pool,
memcpy(&pool->p, params, sizeof(pool->p));
+ /* overwrite to allow recycling.. */
+ if (pool->p.nid == NUMA_NO_NODE)
+ pool->p.nid = numa_mem_id();
+
if user wants dev_to_node() then use can use dev_to_node() on pool
initialization rather than NUMA_NO_NODE.
> Thanks
> /Ilias
> > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > index a6aefe989043..4374a6239d17 100644
> > --- a/net/core/page_pool.c
> > +++ b/net/core/page_pool.c
> > @@ -108,6 +108,10 @@ static struct page
> > *__page_pool_get_cached(struct page_pool *pool)
> > if (likely(pool->alloc.count)) {
> > /* Fast-path */
> > page = pool->alloc.cache[--pool-
> > >alloc.count];
> > +
> > + if (unlikely(READ_ONCE(pool->p.nid) !=
> > numa_mem_id()))
> > + WRITE_ONCE(pool->p.nid,
> > numa_mem_id());
> > +
> > return page;
> > }
> > refill = true;
> > @@ -155,6 +159,10 @@ static struct page
> > *__page_pool_alloc_pages_slow(struct page_pool *pool,
> > if (pool->p.order)
> > gfp |= __GFP_COMP;
> >
> > +
> > + if (unlikely(READ_ONCE(pool->p.nid) != numa_mem_id()))
> > + WRITE_ONCE(pool->p.nid, numa_mem_id());
> > +
> > /* FUTURE development:
> > *
> > * Current slow-path essentially falls back to single page
> > Thanks
> >
> > -Li
Powered by blists - more mailing lists