[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZvPX2J7D9w0EJTUo@kernel.org>
Date: Wed, 25 Sep 2024 12:28:56 +0300
From: Mike Rapoport <rppt@...nel.org>
To: Bruno Faccini <bfaccini@...dia.com>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
Zi Yan <ziy@...dia.com>, Timur Tabi <ttabi@...dia.com>,
John Hubbard <jhubbard@...dia.com>
Subject: Re: [PATCH] mm/fake-numa: per-phys node fake size
Hi Bruno,
Please reply inline to the mails on Linux kernel mailing lists.
On Tue, Sep 24, 2024 at 03:27:52PM +0000, Bruno Faccini wrote:
> On 24/09/2024 12:43, "Mike Rapoport" <rppt@...nel.org> wrote:
> > On Sat, Sep 21, 2024 at 01:13:49AM -0700, Bruno Faccini wrote:
> > > Determine fake numa node size on a per-phys node basis to
> > > handle cases where there are big differences of reserved
> > > memory size inside physical nodes, this will allow to get
> > > the expected number of nodes evenly interleaved.
> > >
> > > Consider a system with 2 physical Numa nodes where almost
> > > all reserved memory sits into a single node, computing the
> > > fake-numa nodes (fake=N) size as the ratio of all
> > > available/non-reserved memory can cause the inability to
> > > create N/2 fake-numa nodes in the physical node.
> >
> >
> > I'm not sure I understand the problem you are trying to solve.
> > Can you provide more specific example?
>
> I will try to be more precise about the situation I have encountered with
> your original set of patches and how I thought it could be solved.
>
> On a system with 2 physical Numa nodes each with 480GB local memory,
> where the biggest part of reserved memory (~ 309MB) is from node 0 with a
> small part (~ 51MB) from node 1, leading to the fake node size of ~<120GB
> being determined.
>
> But when allocating fake nodes from physical nodes, with let say fake=8
> boot parameter being used, we ended with less (7) than expected, because
> there was not enough room to allocate 8/2 fake nodes in physical node 0,
> due to too big size evaluation.
The ability to split a physical node to emulated nodes depends not only on
the node sizes and hole sizes, but also where the holes are located inside
the nodes and it's quite possible that for some memory layouts
split_nodes_interleave() will fail to create the requested number of the
emulated nodes.
> I don't think that fake=N allocation method is intended to get fake nodes
> with equal size, but to get this exact number of nodes. This is why I
> think we should use a per-phys node size for the fake nodes it will host.
IMO your change adds to much complexity for a feature that by definition
should be used only for debugging.
Also, there is a variation numa=fake=<N>U of numa=fake parameter that
divides each node into N emulated nodes.
> Hope this clarifies the reason and intent for my patch, have a good day,
> Bruno
>
>
> > Signed-off-by: Bruno Faccini <bfaccini@...dia.com>
> > ---
> > mm/numa_emulation.c | 66 ++++++++++++++++++++++++++-------------------
> > 1 file changed, 39 insertions(+), 27 deletions(-)
--
Sincerely yours,
Mike.
Powered by blists - more mailing lists