[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zqh3-TWBkhyY5kPw@PC2K9PVX.TheFacebook.com>
Date: Tue, 30 Jul 2024 01:19:53 -0400
From: Gregory Price <gourry@...rry.net>
To: "Huang, Ying" <ying.huang@...el.com>
Cc: linux-mm@...ck.org, akpm@...ux-foundation.org, dave.jiang@...el.com,
Jonathan.Cameron@...wei.com, horenchuang@...edance.com,
linux-kernel@...r.kernel.org, linux-acpi@...r.kernel.org,
dan.j.williams@...el.com, lenb@...nel.org,
"Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>
Subject: Re: [PATCH] acpi/hmat,mm/memtier: always register hmat adist
calculation callback
On Tue, Jul 30, 2024 at 09:12:55AM +0800, Huang, Ying wrote:
> > Right now HMAT appears to be used prescriptively, this despite the fact
> > that there was a clear intent to separate CPU-nodes and non-CPU-nodes in
> > the memory-tier code. So this patch simply realizes this intent when the
> > hints are not very reasonable.
>
> If HMAT isn't available, it's hard to put memory devices to
> appropriate memory tiers without other information. In commit
> 992bf77591cb ("mm/demotion: add support for explicit memory tiers"),
> Aneesh pointed out that it doesn't work for his system to put
> non-CPU-nodes in lower tier.
>
Per Aneesh in 992bf77591cb - The code explicitly states the intent is
to put non-CPU-nodes in a lower tier by default.
The current implementation puts all nodes with CPU into the highest
tier, and builds the tier hierarchy by establishing the per-node
demotion targets based on the distances between nodes.
This is accurate for the current code
The current tier initialization code always initializes each
memory-only NUMA node into a lower tier.
This is *broken* for the currently upstream code.
This appears to be the result of the hmat adistance callback introduction
(though it may have been broken before that).
~Gregory
Powered by blists - more mailing lists