linux-kernel - Re: [RFC PATCH v4 1/7] mm/demotion: Add support for explicit memory tiers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8e826a0ae730f6f6e43e82a26a9e22059a5a1682.camel@intel.com>
Date:   Wed, 08 Jun 2022 09:24:25 +0800
From:   Ying Huang <ying.huang@...el.com>
To:     Aneesh Kumar K V <aneesh.kumar@...ux.ibm.com>
Cc:     Greg Thelen <gthelen@...gle.com>, Yang Shi <shy828301@...il.com>,
        Davidlohr Bueso <dave@...olabs.net>,
        Tim C Chen <tim.c.chen@...el.com>,
        Brice Goglin <brice.goglin@...il.com>,
        Michal Hocko <mhocko@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Hesham Almatary <hesham.almatary@...wei.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Jonathan Cameron <Jonathan.Cameron@...wei.com>,
        Alistair Popple <apopple@...dia.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Feng Tang <feng.tang@...el.com>,
        Jagdish Gediya <jvgediya@...ux.ibm.com>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        David Rientjes <rientjes@...gle.com>, linux-mm@...ck.org,
        akpm@...ux-foundation.org
Subject: Re: [RFC PATCH v4 1/7] mm/demotion: Add support for explicit memory
 tiers

On Mon, 2022-06-06 at 14:32 +0530, Aneesh Kumar K V wrote:
> On 6/6/22 2:22 PM, Ying Huang wrote:
> ....
> > > > > I can move the patch "mm/demotion/dax/kmem: Set node's memory tier to
> > > > > MEMORY_TIER_PMEM" before switching the demotion logic so that on systems
> > > > > with two memory tiers (DRAM and pmem) the demotion continues to work
> > > > > as expected after patch 3 ("mm/demotion: Build demotion targets based on
> > > > > explicit memory tiers"). With that, there will not be any regression in
> > > > > between the patch series.
> > > > > 
> > > > 
> > > > Thanks!  Please do that.  And I think you can add sysfs interface after
> > > > that patch too.  That is, in [1/7]
> > > > 
> > > 
> > > I am not sure why you insist on moving sysfs interfaces later. They are
> > > introduced based on the helper added. It make patch review easier to
> > > look at both the helpers and the user of the helper together in a patch.
> > 
> > Yes.  We should introduce a function and its user in one patch for
> > review.  But this doesn't mean that we should introduce the user space
> > interface as the first step.  I think the user space interface should
> > output correct information when we expose it.
> > 
> 
> If you look at this patchset we are not exposing any wrong information.
> 
> patch 1 -> adds ability to register the memory tiers and expose details 
> of registered memory tier. At this point the patchset only support DRAM 
> tier and hence only one tier is shown

But inside kernel, we actually work with 2 tiers and demote/prmote pages
between them.  With the information from your interface, users would
think that there is no any demotion/promotion in kernel because there's
only 1 tier.

> patch 2 -> adds per node memtier attribute. So only DRAM nodes shows the 
> details, because the patchset yet has not introduced a slower memory 
> tier like PMEM.
> 
> patch 4 -> introducing demotion. Will make that patch 5
> 
> patch 5 -> add dax kmem numa nodes as slower memory tier. Now this 
> becomes patch 4 at which point we will correctly show two memory tiers 
> in the system.
> 
> 
> > > > +struct memory_tier {
> > > > +	nodemask_t nodelist;
> > > > +};
> > > > 
> > > > And struct device can be added after the kernel has switched the
> > > > implementation based on explicit memory tiers.
> > > > 
> > > > +struct memory_tier {
> > > > +	struct device dev;
> > > > +	nodemask_t nodelist;
> > > > +};
> > > > 
> > > 
> > > 
> > > Can you elaborate on this? or possibly review the v5 series indicating
> > > what change you are suggesting here?
> > > 
> > > 
> > > > But I don't think it's a good idea to have "struct device" embedded in
> > > > "struct memory_tier".  We don't have "struct device" embedded in "struct
> > > > pgdata_list"...
> > > > 
> > > 
> > > I avoided creating an array for memory_tier (memory_tier[]) so that we
> > > can keep it dynamic. Keeping dev embedded in struct memory_tier simplify
> > > the life cycle management of that dynamic list. We free the struct
> > > memory_tier allocation via device release function (memtier->dev.release
> > > = memory_tier_device_release )
> > > 
> > > Why do you think it is not a good idea?
> > 
> > I think that we shouldn't bind our kernel internal implementation with
> > user space interface too much.  Yes.  We can expose kernel internal
> > implementation to user space in a direct way.  I suggest you to follow
> > the style of "struct pglist_data" and "struct node".  If we decouple
> > "struct memory_tier" and "struct memory_tier_dev" (or some other name),
> > we can refer to "struct memory_tier" without depending on all device
> > core.  Memory tier should be accessible inside the kernel even without a
> > user interface.  And memory tier isn't a device in concept.
> > 
> 
> memory_tiers are different from pglist_data and struct node in that we 
> also allow the creation of them from userspace.

I don't think that there's much difference.  struct pglist_data and
struct node can be created/destroyed dynamically too.  Please take a
look at

  __try_online_node()
  register_one_node()
  try_offline_node()
  unregister_one_node()

> That is the life time of 
> a memory tier is driven from userspace and it is much easier to manage 
> them via sysfs file lifetime mechanism rather than inventing an 
> independent and more complex way of doing the same.

You needs to manage the lifetime of struct memory_tier in kernel too. 
Because there are kernel users.  And even if you use device core
lifetime mechanism, you don't need to embed struct device in struct
memory_tier too, you can free "separate" struct memory_tier in "release"
callback of struct device.

> > For life cycle management, I think that we can do that without sysfs
> > too.
> > 
> 
> unless there are specific details that you think will be broken by 
> embedding struct device inside struct memory_tier, IMHO I still consider 
> the embedded implementation much simpler and in accordance with other 
> kernel design patterns.

In concept, struct memory_tier isn't a device.  Although we expose it as
a device in sysfs.  That's just an implementation detail.  So I think
it's better to make struct memory_tier independent of struct device if
possible.

Via not embeding struct device in struct memory_tier, it's much easier
to dereference struct memory_tier directly in inline function in ".h". 
We don't need to introduce one accessor function for each field of
struct memory_tier for that.

Best Regards,
Huang, Ying