lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87a69yzpk7.fsf@linux.ibm.com>
Date:   Mon, 27 Jun 2022 10:10:56 +0530
From:   "Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>
To:     Bagas Sanjaya <bagasdotme@...il.com>
Cc:     linux-mm@...ck.org, akpm@...ux-foundation.org,
        Wei Xu <weixugc@...gle.com>, Huang Ying <ying.huang@...el.com>,
        Yang Shi <shy828301@...il.com>,
        Davidlohr Bueso <dave@...olabs.net>,
        Tim C Chen <tim.c.chen@...el.com>,
        Michal Hocko <mhocko@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Hesham Almatary <hesham.almatary@...wei.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Jonathan Cameron <Jonathan.Cameron@...wei.com>,
        Alistair Popple <apopple@...dia.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Jagdish Gediya <jvgediya@...ux.ibm.com>,
        linux-doc@...r.kernel.org
Subject: Re: [PATCH v7 11/12] mm/demotion: Add documentation for memory tiering

Bagas Sanjaya <bagasdotme@...il.com> writes:

> On Wed, Jun 22, 2022 at 01:55:12PM +0530, Aneesh Kumar K.V wrote:
>> From: Jagdish Gediya <jvgediya@...ux.ibm.com>
>> 
>
> Hi Aneesh and Jagdish,
>
> The documentation can be improved, see below.
>
>> All N_MEMORY nodes are divided into 3 memoty tiers with tier ID value
>> MEMORY_TIER_HBM_GPU, MEMORY_TIER_DRAM and MEMORY_TIER_PMEM. By default,
>> all nodes are assigned to default memory tier.
>> 
>> Demotion path for all N_MEMORY nodes is prepared based on the tier ID value
>> of memory tiers.
>> 
>> This patch adds documention for memory tiering introduction, its sysfs
>> interfaces and how demotion is performed based on memory tiers.
>> 
>
> I think the patch message should just be:
> "Add documentation for memory tiering. It also covers its sysfs
> interfaces and how demotion is performed based on memory tiers."
>
>> +===========
>> +Memory tiers
>> +============
>> +
>> +This document describes explicit memory tiering support along with
>> +demotion based on memory tiers.
>> +
>
> This causes htmldocs error, for which I have applied the fixup at [1].
>
>> +Memory nodes are divided into 3 types of memory tiers with tier ID
>> +value as shown based on their hardware characteristics.
>> +
>> +
>> +MEMORY_TIER_HBM_GPU
>> +MEMORY_TIER_DRAM
>> +MEMORY_TIER_PMEM
>> +
>
> Use bullet list.
>
>> +Sysfs interfaces
>> +================
>> +
>> +Nodes belonging to specific tier can be read from,
>> +/sys/devices/system/memtier/memtierN/nodelist (Read-Only)
>> +
>> +Where N is 0 - 2.
>
> The "where" sentence can be compounded into the previous sentence above.
>
>> +
>> +Example 1:
>> +For a system where Node 0 is CPU + DRAM nodes, Node 1 is HBM node,
>> +node 2 is a PMEM node an ideal tier layout will be
>> +
>> +$ cat /sys/devices/system/memtier/memtier0/nodelist
>> +1
>> +$ cat /sys/devices/system/memtier/memtier1/nodelist
>> +0
>> +$ cat /sys/devices/system/memtier/memtier2/nodelist
>> +2
>> +
>
> The code snippets should have been inside literal code blocks.
>
>> +Example 2:
>> +For a system where Node 0 & 1 are CPU + DRAM nodes, node 2 & 3 are PMEM
>> +nodes.
>> +
>> +$ cat /sys/devices/system/memtier/memtier0/nodelist
>> +cat: /sys/devices/system/memtier/memtier0/nodelist: No such file or
>> +directory
>> +$ cat /sys/devices/system/memtier/memtier1/nodelist
>> +0-1
>> +$ cat /sys/devices/system/memtier/memtier2/nodelist
>> +2-3
>> +
>
> Use literal code block.
>
>> +Default memory tier can be read from,
>> +/sys/devices/system/memtier/default_tier (Read-Only)
>> +
>> +e.g.
>> +$ cat /sys/devices/system/memtier/default_tier
>> +memtier200
>> +
>> +Max memory tier ID supported can be read from,
>> +/sys/devices/system/memtier/max_tier (Read-Only)
>> +
>> +e.g.
>> +$ cat /sys/devices/system/memtier/max_tier
>> +400
>> +
>> +Individual node's memory tier can be read of set using,
>> +/sys/devices/system/node/nodeN/memtier	(Read-Write)
>> +
>> +where N = node id
>> +
>> +When this interface is written, Node is moved from the old memory tier
>> +to new memory tier and demotion targets for all N_MEMORY nodes are
>> +built again.
>> +
>> +For example 1 mentioned above,
>> +$ cat /sys/devices/system/node/node0/memtier
>> +1
>> +$ cat /sys/devices/system/node/node1/memtier
>> +0
>> +$ cat /sys/devices/system/node/node2/memtier
>> +2
>> +
>
> The same suggestions above apply here, too.
>
>> +Enable/Disable demotion
>> +-----------------------
>> +
>> +By default demotion is disabled, it can be enabled/disabled using
>> +below sysfs interface,
>> +
>> +$ echo 0/1 or false/true > /sys/kernel/mm/numa/demotion_enabled
>> +
>
> Use literal code block.
>
>> +preferred and allowed demotion nodes
>> +------------------------------------
>> +
>> +Preferred nodes for a specific N_MEMORY node are the best nodes
>> +from the next possible lower memory tier. Allowed nodes for any
>> +node are all the nodes available in all possible lower memory
>> +tiers.
>> +
>> +Example:
>> +
>> +For a system where Node 0 & 1 are CPU + DRAM nodes, node 2 & 3 are PMEM
>> +nodes,
>> +
>> +node distances:
>> +node   0    1    2    3
>> +   0  10   20   30   40
>> +   1  20   10   40   30
>> +   2  30   40   10   40
>> +   3  40   30   40   10
>> +
>
> Use reST table.
>
>> +memory_tiers[0] = <empty>
>> +memory_tiers[1] = 0-1
>> +memory_tiers[2] = 2-3
>> +
>> +node_demotion[0].preferred = 2
>> +node_demotion[0].allowed   = 2, 3
>> +node_demotion[1].preferred = 3
>> +node_demotion[1].allowed   = 3, 2
>> +node_demotion[2].preferred = <empty>
>> +node_demotion[2].allowed   = <empty>
>> +node_demotion[3].preferred = <empty>
>> +node_demotion[3].allowed   = <empty>
>> +
>
> What are these above? Node properties? BTW, use literal code block.
>
> If you don't understand these suggestions above, here is the diff:

I got with the below diff.
patch: **** malformed patch at line 180: @@ -148,35 +153,40 @@ from the next possible lower memory tier. Allowed nodes for any

But I did modify the documentation based on your feedback and it is much
better than what I had. Thanks for the review. I will send v8 with the
changes folded. I did add the below to commit message. Hope that is ok. 

[update doc format by Bagas Sanjaya <bagasdotme@...il.com>]

>
> ---- >8 ----
>
> diff --git a/Documentation/admin-guide/mm/memory-tiering.rst b/Documentation/admin-guide/mm/memory-tiering.rst
> index 0a75e0dab1fd8e..10ec5aab6ddd53 100644
> --- a/Documentation/admin-guide/mm/memory-tiering.rst
> +++ b/Documentation/admin-guide/mm/memory-tiering.rst
> @@ -14,13 +14,13 @@ Introduction
>  
>  Many systems have multiple types of memory devices e.g. GPU, DRAM and
>  PMEM. The memory subsystem of these systems can be called a memory
> -tiering system because the performance of the different types of
> +tiering system because the performance of each type of
>  memory is different. Memory tiers are defined based on the hardware
>  capabilities of memory nodes. Each memory tier is assigned a tier ID
>  value that determines the memory tier position in demotion order.
>  
>  The memory tier assignment of each node is independent of each
> -other. Moving a node from one tier to another tier doesn't affect
> +other. Moving a node from one tier to another doesn't affect
>  the tier assignment of any other node.
>  
>  Memory tiers are used to build the demotion targets for nodes. A node
> @@ -32,10 +32,9 @@ Memory tier rank
>  Memory nodes are divided into 3 types of memory tiers with tier ID
>  value as shown based on their hardware characteristics.
>  
> -
> -MEMORY_TIER_HBM_GPU
> -MEMORY_TIER_DRAM
> -MEMORY_TIER_PMEM
> +  * MEMORY_TIER_HBM_GPU
> +  * MEMORY_TIER_DRAM
> +  * MEMORY_TIER_PMEM
>  
>  Memory tiers initialization and (re)assignments
>  ===============================================
> @@ -49,68 +48,73 @@ hotplug, the memory tier with default tier ID is assigned to the memory node.
>  Sysfs interfaces
>  ================
>  
> -Nodes belonging to specific tier can be read from,
> -/sys/devices/system/memtier/memtierN/nodelist (Read-Only)
> +Nodes belonging to specific tier can be read from
> +/sys/devices/system/memtier/memtierN/nodelist, where N is 0 - 2 (read-only)
>  
> -Where N is 0 - 2.
> +Examples:
>  
> -Example 1:
> -For a system where Node 0 is CPU + DRAM nodes, Node 1 is HBM node,
> -node 2 is a PMEM node an ideal tier layout will be
> +1. On a system where Node 0 is CPU + DRAM nodes, Node 1 is HBM node,
> +   node 2 is a PMEM node an ideal tier layout will be:
>  
> -$ cat /sys/devices/system/memtier/memtier0/nodelist
> -1
> -$ cat /sys/devices/system/memtier/memtier1/nodelist
> -0
> -$ cat /sys/devices/system/memtier/memtier2/nodelist
> -2
> +   .. code-block::
>  
> -Example 2:
> -For a system where Node 0 & 1 are CPU + DRAM nodes, node 2 & 3 are PMEM
> -nodes.
> +      $ cat /sys/devices/system/memtier/memtier0/nodelist
> +      1
> +      $ cat /sys/devices/system/memtier/memtier1/nodelist
> +      0
> +      $ cat /sys/devices/system/memtier/memtier2/nodelist
> +      2
>  
> -$ cat /sys/devices/system/memtier/memtier0/nodelist
> -cat: /sys/devices/system/memtier/memtier0/nodelist: No such file or
> -directory
> -$ cat /sys/devices/system/memtier/memtier1/nodelist
> -0-1
> -$ cat /sys/devices/system/memtier/memtier2/nodelist
> -2-3
> +2. On a system where Node 0 & 1 are CPU + DRAM nodes, node 2 & 3 are PMEM
> +   nodes:
>  
> -Default memory tier can be read from,
> -/sys/devices/system/memtier/default_tier (Read-Only)
> +   .. code-block::
>  
> -e.g.
> -$ cat /sys/devices/system/memtier/default_tier
> -memtier200
> +      $ cat /sys/devices/system/memtier/memtier0/nodelist
> +      cat: /sys/devices/system/memtier/memtier0/nodelist: No such file or
> +      directory
> +      $ cat /sys/devices/system/memtier/memtier1/nodelist
> +      0-1
> +      $ cat /sys/devices/system/memtier/memtier2/nodelist
> +      2-3
>  
> -Max memory tier ID supported can be read from,
> -/sys/devices/system/memtier/max_tier (Read-Only)
> +Default memory tier can be read from
> +/sys/devices/system/memtier/default_tier (read-only), e.g.:
>  
> -e.g.
> -$ cat /sys/devices/system/memtier/max_tier
> -400
> +.. code-block::
>  
> -Individual node's memory tier can be read of set using,
> -/sys/devices/system/node/nodeN/memtier	(Read-Write)
> +   $ cat /sys/devices/system/memtier/default_tier
> +   memtier200
>  
> -where N = node id
> +Max memory tier ID supported can be read from
> +/sys/devices/system/memtier/max_tier (read-only), e.g.:
>  
> -When this interface is written, Node is moved from the old memory tier
> +.. code-block::
> +
> +   $ cat /sys/devices/system/memtier/max_tier
> +   400
> +
> +Individual node's memory tier can be read or set using
> +/sys/devices/system/node/nodeN/memtier (read-write), where N = node id.
> +
> +When this interface is written, node is moved from the old memory tier
>  to new memory tier and demotion targets for all N_MEMORY nodes are
>  built again.
>  
> -For example 1 mentioned above,
> -$ cat /sys/devices/system/node/node0/memtier
> -1
> -$ cat /sys/devices/system/node/node1/memtier
> -0
> -$ cat /sys/devices/system/node/node2/memtier
> -2
> +For example 1 mentioned above:
> +
> +.. code-block::
> +
> +   $ cat /sys/devices/system/node/node0/memtier
> +   1
> +   $ cat /sys/devices/system/node/node1/memtier
> +   0
> +   $ cat /sys/devices/system/node/node2/memtier
> +   2
>  
>  Additional memory tiers can be created by writing a tier ID value to this file.
> -This results in a new memory tier creation and moving the specific NUMA node to
> -that memory tier.
> +This results into creating a new tier and moving the specific NUMA node to
> +that tier.
>  
>  Demotion
>  ========
> @@ -128,19 +132,20 @@ be used.
>  
>  Instead of a page being discarded during reclaim, it can be moved to
>  persistent memory. Allowing page migration during reclaim enables
> -these systems to migrate pages from fast(higher) tiers to slow(lower)
> -tiers when the fast(higher) tier is under pressure.
> +these systems to migrate pages from fast (higher) tiers to slow (lower)
> +tiers when the fast (higher) tier is under pressure.
>  
>  
>  Enable/Disable demotion
>  -----------------------
>  
> -By default demotion is disabled, it can be enabled/disabled using
> -below sysfs interface,
> +By default demotion is disabled. It can be toggled by:
>  
> -$ echo 0/1 or false/true > /sys/kernel/mm/numa/demotion_enabled
> +.. code-block::
>  
> -preferred and allowed demotion nodes
> +   $ echo 0/1 or false/true > /sys/kernel/mm/numa/demotion_enabled
> +
> +Preferred and allowed demotion nodes
>  ------------------------------------
>  
>  Preferred nodes for a specific N_MEMORY node are the best nodes
> @@ -148,35 +153,40 @@ from the next possible lower memory tier. Allowed nodes for any
>  node are all the nodes available in all possible lower memory
>  tiers.
>  
> -Example:
> +For example, on a system where Node 0 & 1 are CPU + DRAM nodes,
> +node 2 & 3 are PMEM nodes:
>  
> -For a system where Node 0 & 1 are CPU + DRAM nodes, node 2 & 3 are PMEM
> -nodes,
> +  * node distances
>  
> -node distances:
> -node   0    1    2    3
> -   0  10   20   30   40
> -   1  20   10   40   30
> -   2  30   40   10   40
> -   3  40   30   40   10
> +    ====  ==   ==   ==   ==
> +    node   0    1    2    3
> +    ====  ==   ==   ==   ==
> +       0  10   20   30   40
> +       1  20   10   40   30
> +       2  30   40   10   40
> +       3  40   30   40   10
> +    ====  ==   ==   ==   ==
>  
> -memory_tiers[0] = <empty>
> -memory_tiers[1] = 0-1
> -memory_tiers[2] = 2-3
> +  * node properties
>  
> -node_demotion[0].preferred = 2
> -node_demotion[0].allowed   = 2, 3
> -node_demotion[1].preferred = 3
> -node_demotion[1].allowed   = 3, 2
> -node_demotion[2].preferred = <empty>
> -node_demotion[2].allowed   = <empty>
> -node_demotion[3].preferred = <empty>
> -node_demotion[3].allowed   = <empty>
> +    .. code-block::
> +
> +       memory_tiers[0] = <empty>
> +       memory_tiers[1] = 0-1
> +       memory_tiers[2] = 2-3
> +
> +       node_demotion[0].preferred = 2
> +       node_demotion[0].allowed   = 2, 3
> +       node_demotion[1].preferred = 3
> +       node_demotion[1].allowed   = 3, 2
> +       node_demotion[2].preferred = <empty>
> +       node_demotion[2].allowed   = <empty>
> +       node_demotion[3].preferred = <empty>
> +       node_demotion[3].allowed   = <empty>
>  
>  Memory allocation for demotion
>  ------------------------------
>  
> -If a page needs to be demoted from any node, the kernel 1st tries
> -to allocate a new page from the node's preferred node and fallbacks to
> -node's allowed targets in allocation fallback order.
> -
> +If a page needs to be demoted from any node, the kernel first tries
> +to allocate a new page from the node's preferred target node and fallbacks
> +to node's allowed targets in allocation fallback order.
>
>
> Thanks.
>
> [1]: https://lore.kernel.org/linux-doc/YrZ5cTFOSuWxlF2t@debian.me/
>
> -- 
> An old man doll... just what I always wanted! - Clara

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ