lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YqDncfLeEeBaosrY@cmpxchg.org>
Date:   Wed, 8 Jun 2022 14:16:17 -0400
From:   Johannes Weiner <hannes@...xchg.org>
To:     Aneesh Kumar K V <aneesh.kumar@...ux.ibm.com>
Cc:     linux-mm@...ck.org, akpm@...ux-foundation.org,
        Wei Xu <weixugc@...gle.com>, Huang Ying <ying.huang@...el.com>,
        Greg Thelen <gthelen@...gle.com>,
        Yang Shi <shy828301@...il.com>,
        Davidlohr Bueso <dave@...olabs.net>,
        Tim C Chen <tim.c.chen@...el.com>,
        Brice Goglin <brice.goglin@...il.com>,
        Michal Hocko <mhocko@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Hesham Almatary <hesham.almatary@...wei.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Jonathan Cameron <Jonathan.Cameron@...wei.com>,
        Alistair Popple <apopple@...dia.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Feng Tang <feng.tang@...el.com>,
        Jagdish Gediya <jvgediya@...ux.ibm.com>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        David Rientjes <rientjes@...gle.com>
Subject: Re: [PATCH v5 1/9] mm/demotion: Add support for explicit memory tiers

On Wed, Jun 08, 2022 at 09:43:52PM +0530, Aneesh Kumar K V wrote:
> On 6/8/22 9:25 PM, Johannes Weiner wrote:
> > Hello,
> > 
> > On Wed, Jun 08, 2022 at 10:11:31AM -0400, Johannes Weiner wrote:
> > > On Fri, Jun 03, 2022 at 07:12:29PM +0530, Aneesh Kumar K.V wrote:
> > > > @@ -0,0 +1,20 @@
> > > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > > +#ifndef _LINUX_MEMORY_TIERS_H
> > > > +#define _LINUX_MEMORY_TIERS_H
> > > > +
> > > > +#ifdef CONFIG_TIERED_MEMORY
> > > > +
> > > > +#define MEMORY_TIER_HBM_GPU	0
> > > > +#define MEMORY_TIER_DRAM	1
> > > > +#define MEMORY_TIER_PMEM	2
> > > > +
> > > > +#define MEMORY_RANK_HBM_GPU	300
> > > > +#define MEMORY_RANK_DRAM	200
> > > > +#define MEMORY_RANK_PMEM	100
> > > > +
> > > > +#define DEFAULT_MEMORY_TIER	MEMORY_TIER_DRAM
> > > > +#define MAX_MEMORY_TIERS  3
> > > 
> > > I understand the names are somewhat arbitrary, and the tier ID space
> > > can be expanded down the line by bumping MAX_MEMORY_TIERS.
> > > 
> > > But starting out with a packed ID space can get quite awkward for
> > > users when new tiers - especially intermediate tiers - show up in
> > > existing configurations. I mentioned in the other email that DRAM !=
> > > DRAM, so new tiers seem inevitable already.
> > > 
> > > It could make sense to start with a bigger address space and spread
> > > out the list of kernel default tiers a bit within it:
> > > 
> > > MEMORY_TIER_GPU		0
> > > MEMORY_TIER_DRAM	10
> > > MEMORY_TIER_PMEM	20
> > 
> > Forgive me if I'm asking a question that has been answered. I went
> > back to earlier threads and couldn't work it out - maybe there were
> > some off-list discussions? Anyway...
> > 
> > Why is there a distinction between tier ID and rank? I undestand that
> > rank was added because tier IDs were too few. But if rank determines
> > ordering, what is the use of a separate tier ID? IOW, why not make the
> > tier ID space wider and have the kernel pick a few spread out defaults
> > based on known hardware, with plenty of headroom to be future proof.
> > 
> >    $ ls tiers
> >    100				# DEFAULT_TIER
> >    $ cat tiers/100/nodelist
> >    0-1				# conventional numa nodes
> > 
> >    <pmem is onlined>
> > 
> >    $ grep . tiers/*/nodelist
> >    tiers/100/nodelist:0-1	# conventional numa
> >    tiers/200/nodelist:2		# pmem
> > 
> >    $ grep . nodes/*/tier
> >    nodes/0/tier:100
> >    nodes/1/tier:100
> >    nodes/2/tier:200
> > 
> >    <unknown device is online as node 3, defaults to 100>
> > 
> >    $ grep . tiers/*/nodelist
> >    tiers/100/nodelist:0-1,3
> >    tiers/200/nodelist:2
> > 
> >    $ echo 300 >nodes/3/tier
> >    $ grep . tiers/*/nodelist
> >    tiers/100/nodelist:0-1
> >    tiers/200/nodelist:2
> >    tiers/300/nodelist:3
> > 
> >    $ echo 200 >nodes/3/tier
> >    $ grep . tiers/*/nodelist
> >    tiers/100/nodelist:0-1	
> >    tiers/200/nodelist:2-3
> > 
> > etc.
> 
> tier ID is also used as device id memtier.dev.id. It was discussed that we
> would need the ability to change the rank value of a memory tier. If we make
> rank value same as tier ID or tier device id, we will not be able to support
> that.

Is the idea that you could change the rank of a collection of nodes in
one go? Rather than moving the nodes one by one into a new tier?

[ Sorry, I wasn't able to find this discussion. AFAICS the first
  patches in RFC4 already had the struct device { .id = tier }
  logic. Could you point me to it? In general it would be really
  helpful to maintain summarized rationales for such decisions in the
  coverletter to make sure things don't get lost over many, many
  threads, conferences, and video calls. ]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ