[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8e826a0ae730f6f6e43e82a26a9e22059a5a1682.camel@intel.com>
Date: Wed, 08 Jun 2022 09:24:25 +0800
From: Ying Huang <ying.huang@...el.com>
To: Aneesh Kumar K V <aneesh.kumar@...ux.ibm.com>
Cc: Greg Thelen <gthelen@...gle.com>, Yang Shi <shy828301@...il.com>,
Davidlohr Bueso <dave@...olabs.net>,
Tim C Chen <tim.c.chen@...el.com>,
Brice Goglin <brice.goglin@...il.com>,
Michal Hocko <mhocko@...nel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Hesham Almatary <hesham.almatary@...wei.com>,
Dave Hansen <dave.hansen@...el.com>,
Jonathan Cameron <Jonathan.Cameron@...wei.com>,
Alistair Popple <apopple@...dia.com>,
Dan Williams <dan.j.williams@...el.com>,
Feng Tang <feng.tang@...el.com>,
Jagdish Gediya <jvgediya@...ux.ibm.com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>,
David Rientjes <rientjes@...gle.com>, linux-mm@...ck.org,
akpm@...ux-foundation.org
Subject: Re: [RFC PATCH v4 1/7] mm/demotion: Add support for explicit memory
tiers
On Mon, 2022-06-06 at 14:32 +0530, Aneesh Kumar K V wrote:
> On 6/6/22 2:22 PM, Ying Huang wrote:
> ....
> > > > > I can move the patch "mm/demotion/dax/kmem: Set node's memory tier to
> > > > > MEMORY_TIER_PMEM" before switching the demotion logic so that on systems
> > > > > with two memory tiers (DRAM and pmem) the demotion continues to work
> > > > > as expected after patch 3 ("mm/demotion: Build demotion targets based on
> > > > > explicit memory tiers"). With that, there will not be any regression in
> > > > > between the patch series.
> > > > >
> > > >
> > > > Thanks! Please do that. And I think you can add sysfs interface after
> > > > that patch too. That is, in [1/7]
> > > >
> > >
> > > I am not sure why you insist on moving sysfs interfaces later. They are
> > > introduced based on the helper added. It make patch review easier to
> > > look at both the helpers and the user of the helper together in a patch.
> >
> > Yes. We should introduce a function and its user in one patch for
> > review. But this doesn't mean that we should introduce the user space
> > interface as the first step. I think the user space interface should
> > output correct information when we expose it.
> >
>
> If you look at this patchset we are not exposing any wrong information.
>
> patch 1 -> adds ability to register the memory tiers and expose details
> of registered memory tier. At this point the patchset only support DRAM
> tier and hence only one tier is shown
But inside kernel, we actually work with 2 tiers and demote/prmote pages
between them. With the information from your interface, users would
think that there is no any demotion/promotion in kernel because there's
only 1 tier.
> patch 2 -> adds per node memtier attribute. So only DRAM nodes shows the
> details, because the patchset yet has not introduced a slower memory
> tier like PMEM.
>
> patch 4 -> introducing demotion. Will make that patch 5
>
> patch 5 -> add dax kmem numa nodes as slower memory tier. Now this
> becomes patch 4 at which point we will correctly show two memory tiers
> in the system.
>
>
> > > > +struct memory_tier {
> > > > + nodemask_t nodelist;
> > > > +};
> > > >
> > > > And struct device can be added after the kernel has switched the
> > > > implementation based on explicit memory tiers.
> > > >
> > > > +struct memory_tier {
> > > > + struct device dev;
> > > > + nodemask_t nodelist;
> > > > +};
> > > >
> > >
> > >
> > > Can you elaborate on this? or possibly review the v5 series indicating
> > > what change you are suggesting here?
> > >
> > >
> > > > But I don't think it's a good idea to have "struct device" embedded in
> > > > "struct memory_tier". We don't have "struct device" embedded in "struct
> > > > pgdata_list"...
> > > >
> > >
> > > I avoided creating an array for memory_tier (memory_tier[]) so that we
> > > can keep it dynamic. Keeping dev embedded in struct memory_tier simplify
> > > the life cycle management of that dynamic list. We free the struct
> > > memory_tier allocation via device release function (memtier->dev.release
> > > = memory_tier_device_release )
> > >
> > > Why do you think it is not a good idea?
> >
> > I think that we shouldn't bind our kernel internal implementation with
> > user space interface too much. Yes. We can expose kernel internal
> > implementation to user space in a direct way. I suggest you to follow
> > the style of "struct pglist_data" and "struct node". If we decouple
> > "struct memory_tier" and "struct memory_tier_dev" (or some other name),
> > we can refer to "struct memory_tier" without depending on all device
> > core. Memory tier should be accessible inside the kernel even without a
> > user interface. And memory tier isn't a device in concept.
> >
>
> memory_tiers are different from pglist_data and struct node in that we
> also allow the creation of them from userspace.
I don't think that there's much difference. struct pglist_data and
struct node can be created/destroyed dynamically too. Please take a
look at
__try_online_node()
register_one_node()
try_offline_node()
unregister_one_node()
> That is the life time of
> a memory tier is driven from userspace and it is much easier to manage
> them via sysfs file lifetime mechanism rather than inventing an
> independent and more complex way of doing the same.
You needs to manage the lifetime of struct memory_tier in kernel too.
Because there are kernel users. And even if you use device core
lifetime mechanism, you don't need to embed struct device in struct
memory_tier too, you can free "separate" struct memory_tier in "release"
callback of struct device.
> > For life cycle management, I think that we can do that without sysfs
> > too.
> >
>
> unless there are specific details that you think will be broken by
> embedding struct device inside struct memory_tier, IMHO I still consider
> the embedded implementation much simpler and in accordance with other
> kernel design patterns.
In concept, struct memory_tier isn't a device. Although we expose it as
a device in sysfs. That's just an implementation detail. So I think
it's better to make struct memory_tier independent of struct device if
possible.
Via not embeding struct device in struct memory_tier, it's much easier
to dereference struct memory_tier directly in inline function in ".h".
We don't need to introduce one accessor function for each field of
struct memory_tier for that.
Best Regards,
Huang, Ying
Powered by blists - more mailing lists