lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <158453976.61766398803526.JavaMail.epsvc@epcpadp1new>
Date: Mon, 22 Dec 2025 15:47:21 +0530
From: Alok Rathore <alok.rathore@...sung.com>
To: Bharata B Rao <bharata@....com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	Jonathan.Cameron@...wei.com, dave.hansen@...el.com, gourry@...rry.net,
	mgorman@...hsingularity.net, mingo@...hat.com, peterz@...radead.org,
	raghavendra.kt@....com, riel@...riel.com, rientjes@...gle.com,
	sj@...nel.org, weixugc@...gle.com, willy@...radead.org,
	ying.huang@...ux.alibaba.com, ziy@...dia.com, dave@...olabs.net,
	nifan.cxl@...il.com, xuezhengchu@...wei.com, yiannis@...corp.com,
	akpm@...ux-foundation.org, david@...hat.com, byungchul@...com,
	kinseyho@...gle.com, joshua.hahnjy@...il.com, yuanchu@...gle.com,
	balbirs@...dia.com, shivankg@....com, alokrathore20@...il.com,
	gost.dev@...sung.com, cpgs@...sung.com
Subject: Re: [RFC PATCH v4 3/9] mm: Hot page tracking and promotion

On 06/12/25 03:44PM, Bharata B Rao wrote:
>This introduces a sub-system for collecting memory access
>information from different sources. It maintains the hotness
>information based on the access history and time of access.
>
>Additionally, it provides per-lowertier-node kernel threads
>(named kmigrated) that periodically promote the pages that
>are eligible for promotion.
>
>Sub-systems that generate hot page access info can report that
>using this API:
>
>int pghot_record_access(unsigned long pfn, int nid, int src,
>                        unsigned long time)
>
>@pfn: The PFN of the memory accessed
>@nid: The accessing NUMA node ID
>@src: The temperature source (sub-system) that generated the
>      access info
>@time: The access time in jiffies
>
>Some temperature sources may not provide the nid from which
>the page was accessed. This is true for sources that use
>page table scanning for PTE Accessed bit. For such sources,
>the default toptier node to which such pages should be promoted
>is hard coded.
>
>The hotness information is stored for every page of lower
>tier memory in an unsigned long variable that is part of
>mem_section data structure.
>
>kmigrated is a per-lowertier-node kernel thread that migrates
>the folios marked for migration in batches. Each kmigrated
>thread walks the PFN range spanning its node and checks
>for potential migration candidates.
>
>A bunch of tunables for enabling different hotness sources,
>setting target_nid, frequency threshold are provided in debugfs.
>
>Signed-off-by: Bharata B Rao <bharata@....com>

<snip>

>+++ b/include/linux/pghot.h
>@@ -0,0 +1,71 @@
>+/* SPDX-License-Identifier: GPL-2.0 */
>+#ifndef _LINUX_PGHOT_H
>+#define _LINUX_PGHOT_H
>+
>+/* Page hotness temperature sources */
>+enum pghot_src {
>+	PGHOT_HW_HINTS,
>+	PGHOT_PGTABLE_SCAN,
>+	PGHOT_HINT_FAULT,
>+};
>+
>+#ifdef CONFIG_PGHOT
>+/*
>+ * Bit positions to enable individual sources in pghot/records_enabled
>+ * of debugfs.
>+ */
>+enum pghot_src_enabed {
>+	PGHOT_HWHINTS_BIT	= 0,
>+	PGHOT_PGTSCAN_BIT,
>+	PGHOT_HINTFAULT_BIT,
>+	PGHOT_MAX_BIT
>+};
>+
>+#define PGHOT_HWHINTS_ENABLED	BIT(PGHOT_HWHINTS_BIT)
>+#define PGHOT_PGTSCAN_ENABLED	BIT(PGHOT_PGTSCAN_BIT)
>+#define PGHOT_HINTFAULT_ENABLED	BIT(PGHOT_HINTFAULT_BIT)
>+#define PGHOT_SRC_ENABLED_MASK	GENMASK(PGHOT_MAX_BIT - 1, 0)
>+
>+#define PGHOT_DEFAULT_FREQ_WINDOW	(5 * MSEC_PER_SEC)
>+#define PGHOT_DEFAULT_FREQ_THRESHOLD	2
>+
>+#define KMIGRATED_DEFAULT_SLEEP_MS	100
>+#define KMIGRATED_DEFAULT_BATCH_NR	512
>+
>+#define PGHOT_DEFAULT_NODE	0
>+
>+/*
>+ * Bits 0-31 are used to store nid, frequency and time.
>+ * Bits 32-62 are unused now.
>+ * Bit 63 is used to indicate the page is ready for migration.
>+ */
>+#define PGHOT_MIGRATE_READY	63
>+
>+#define PGHOT_NID_WIDTH		10
>+#define PGHOT_FREQ_WIDTH	3
>+/* time is stored in 19 bits which can represent up to 8.73s with HZ=1000 */

If we consider HZ = 1000 then using 19 bit time is coming 8.73 mins. I think by mistake you commented as 8.73 secs.

Suggetion:
If we are targeting to promote page in ~8 secs then 13 bits would be enough, that way we can handle hotness using 32 bits per pfn insead of 64 bits.

#define PGHOT_MIGRATE_READY	31
#define PGHOT_NID_WIDTH		10
#define PGHOT_FREQ_WIDTH	3
/* time is stored in 13 bits which can represent up to 8.19s with HZ=1000 */
#define PGHOT_TIME_WIDTH	13

>+#define PGHOT_TIME_WIDTH	19
>+
>+#define PGHOT_NID_SHIFT		0
>+#define PGHOT_FREQ_SHIFT	(PGHOT_NID_SHIFT + PGHOT_NID_WIDTH)
>+#define PGHOT_TIME_SHIFT	(PGHOT_FREQ_SHIFT + PGHOT_FREQ_WIDTH)
>+
>+#define PGHOT_NID_MASK		((1UL << PGHOT_NID_SHIFT) - 1)
>+#define PGHOT_FREQ_MASK		((1UL << PGHOT_FREQ_SHIFT) - 1)
>+#define PGHOT_TIME_MASK		((1UL << PGHOT_TIME_SHIFT) - 1)

Mask generation of freq, nid and time seems not correct. It should be
  
#define PGHOT_NID_MASK         ((1UL << PGHOT_NID_WIDTH) - 1)
#define PGHOT_FREQ_MASK        ((1UL << PGHOT_FREQ_WIDTH) - 1)
#define PGHOT_TIME_MASK        ((1UL << PGHOT_TIME_WIDTH) - 1)

Can you please have a look?


Regards,
Alok Rathore


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ