[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YgDtnk8g7y5oRKXB@TonyMac-Alibaba>
Date: Mon, 7 Feb 2022 17:59:58 +0800
From: Tony Lu <tonylu@...ux.alibaba.com>
To: Leon Romanovsky <leon@...nel.org>
Cc: kgraul@...ux.ibm.com, kuba@...nel.org, davem@...emloft.net,
netdev@...r.kernel.org, linux-s390@...r.kernel.org,
RDMA mailing list <linux-rdma@...r.kernel.org>
Subject: Re: [PATCH net-next] net/smc: Allocate pages of SMC-R on ibdev NUMA
node
On Mon, Jan 31, 2022 at 09:20:52AM +0200, Leon Romanovsky wrote:
> On Mon, Jan 31, 2022 at 03:03:00AM +0800, Tony Lu wrote:
> > Currently, pages are allocated in the process context, for its NUMA node
> > isn't equal to ibdev's, which is not the best policy for performance.
> >
> > Applications will generally perform best when the processes are
> > accessing memory on the same NUMA node. When numa_balancing enabled
> > (which is enabled by most of OS distributions), it moves tasks closer to
> > the memory of sndbuf or rmb and ibdev, meanwhile, the IRQs of ibdev bind
> > to the same node usually. This reduces the latency when accessing remote
> > memory.
>
> It is very subjective per-specific test. I would expect that
> application will control NUMA memory policies (set_mempolicy(), ...)
> by itself without kernel setting NUMA node.
>
> Various *_alloc_node() APIs are applicable for in-kernel allocations
> where user can't control memory policy.
>
> I don't know SMC-R enough, but if I judge from your description, this
> allocation is controlled by the application.
The original design of SMC doesn't handle the memory allocation of
different NUMA node, and the application can't control the NUMA policy
in SMC.
It allocates memory according to the NUMA node based on the process
context, which is determined by the scheduler. If application process
runs on NUMA node 0, SMC allocates on node 0 and so on, it all depends
on the scheduler. If RDMA device is attached to node 1, the process runs
on node 0, it allocates memory on node 0.
This patch tries to allocate memory on the same NUMA node of RDMA
device. Applications can't know the current node of RDMA device. The
scheduler knows the node of memory, and can let applications run on the
same node of memory and RDMA device.
Thanks,
Tony Lu
Powered by blists - more mailing lists