lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240812053636.97700-1-21cnbao@gmail.com>
Date: Mon, 12 Aug 2024 17:36:36 +1200
From: Barry Song <21cnbao@...il.com>
To: david@...hat.com
Cc: akpm@...ux-foundation.org,
	baohua@...nel.org,
	baolin.wang@...ux.alibaba.com,
	corbet@....net,
	ioworker0@...il.com,
	linux-kernel@...r.kernel.org,
	linux-mm@...ck.org,
	ryan.roberts@....com
Subject: Re: [PATCH v2] mm: Override mTHP "enabled" defaults at kernel cmdline

On Sat, Aug 10, 2024 at 1:15 AM David Hildenbrand <david@...hat.com> wrote:
>
> >>>>> -You can change the sysfs boot time defaults of Transparent Hugepage
> >>>>> -Support by passing the parameter ``transparent_hugepage=always`` or
> >>>>> -``transparent_hugepage=madvise`` or ``transparent_hugepage=never``
> >>>>> -to the kernel command line.
> >>>>> +You can change the sysfs boot time default for the top-level "enabled"
> >>>>> +control by passing the parameter ``transparent_hugepage=always`` or
> >>>>> +``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the
> >>>>> +kernel command line.
> >>>>> +
> >>>>> +Alternatively, each supported anonymous THP size can be controlled by
> >>>>> +passing ``thp_anon=<size>[KMG]:<state>``, where ``<size>`` is the THP size
> >>>>> +and ``<state>`` is one of ``always``, ``madvise``, ``never`` or
> >>>>> +``inherit``.
> >>>>> +
> >>>>> +For example, the following will set 64K THP to ``always``::
> >>>>> +
> >>>>> +     thp_anon=64K:always
> >>>>> +
> >>>>> +``thp_anon=`` may be specified multiple times to configure all THP sizes as
> >>>>> +required. If ``thp_anon=`` is specified at least once, any anon THP sizes
> >>>>> +not explicitly configured on the command line are implicitly set to
> >>>>> +``never``.
> >>>>
> >>>> I suggest documenting that "thp_anon=" will not effect the value of
> >>>> "transparent_hugepage=", or any configured default.
> >
> > Did you see the previous conversation with Barry about whether or not to honour
> > configured defaults when any thp_anon= is provided [1]? Sounds like you also
> > think we should honour the PMD "inherit" default if not explicitly provided on
> > the command line? (see link for justification for the approach I'm currently
> > taking).
>
> I primarily think that we should document it :)
>
> What if someone passes "transparent_hugepage=always" and "thp_anon=..."?
> I would assume that transparent_hugepage would only affect the global
> toggle then?
>
> >
> > [1]
> > https://lore.kernel.org/linux-mm/CAGsJ_4x8ruPspuk_FQVggJMWcXLbRuZFq44gg-Dt7Ewt3ExqTw@mail.gmail.com/
> >
> >>>>
> >>>> Wondering if a syntax like
> >>>>
> >>>> thp_anon=16K,32K,64K:always;1048K,2048K:madvise
> >
> > Are there examples of that syntax already or have you just made it up? I found
> > examples with the colon (:) but nothing this fancy. I guess that's not a reason
> > not to do it though (other than the risk of screwing up the parser in a subtle way).
>
> I made it up -- mostly ;) I think we are quite flexible on what we can
> do. As always, maybe we can keep it bit consistent with existing stuff.
>
> For hugetlb_cma we have things like
>         "<node>:nn[KMGTPE|[,<node>:nn[KMGTPE]]
>
> "memmap=" options are more ... advanced, including memory ranges. There
> are a bunch more documented in kernel-parameters.txt that have more
> elaborate formats.
>
> Ranges would probably be the most valuable addition. So maybe we should
> start with:
>
>         thp_anon=16K-64K:always,1048K-2048K:madvise
>
> So to enable all THPs it would simply be
>
>         thp_anon=16K-2M:always
>
> Interesting question what would happen if someone passes:
>
>         thp_anon=8K-2M:always
>
> Likely we simply would apply it to any size in the range, even if
> start/end is not a THP size.
>
> But we would want to complain to the user if someone only specifies a
> single one (or a range does not even select a single one) that does not
> exist:
>
>         thp_anon=8K:always

How about rejecting settings with any illegal size or policy? 

I found there are too many corner cases to say "yes" or "no". It seems
the code could be much cleaner if we simply reject illegal settings.
On the other hand, we should rely on smart users to provide correct
settings, shouldn’t we? While working one the code, I felt that
extracting partial correct settings from incorrect ones and supporting
them might be a bit of over-design.

I have tested the below code by

"thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never"

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1a12c011e2df..6a1f54cff7f9 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -81,6 +81,7 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
 unsigned long huge_anon_orders_always __read_mostly;
 unsigned long huge_anon_orders_madvise __read_mostly;
 unsigned long huge_anon_orders_inherit __read_mostly;
+static bool anon_orders_configured;
 
 unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
 					 unsigned long vm_flags,
@@ -737,7 +738,10 @@ static int __init hugepage_init_sysfs(struct kobject **hugepage_kobj)
 	 * disable all other sizes. powerpc's PMD_ORDER isn't a compile-time
 	 * constant so we have to do this here.
 	 */
-	huge_anon_orders_inherit = BIT(PMD_ORDER);
+	if (!anon_orders_configured) {
+		huge_anon_orders_inherit = BIT(PMD_ORDER);
+		anon_orders_configured = true;
+	}
 
 	*hugepage_kobj = kobject_create_and_add("transparent_hugepage", mm_kobj);
 	if (unlikely(!*hugepage_kobj)) {
@@ -922,6 +926,103 @@ static int __init setup_transparent_hugepage(char *str)
 }
 __setup("transparent_hugepage=", setup_transparent_hugepage);
 
+static inline int get_order_from_str(const char *size_str)
+{
+	unsigned long size;
+	char *endptr;
+	int order;
+
+	size = memparse(size_str, &endptr);
+	order = fls(size >> PAGE_SHIFT) - 1;
+	if (order & ~THP_ORDERS_ALL_ANON) {
+		pr_err("invalid size in thp_anon= %ld\n", size);
+		return -EINVAL;
+	}
+
+	return order;
+}
+
+static inline void set_bits(unsigned long *var, int start, int end)
+{
+	int i;
+
+	for (i = start; i <= end; i++)
+		set_bit(i, var);
+}
+
+static inline void clear_bits(unsigned long *var, int start, int end)
+{
+	int i;
+
+	for (i = start; i <= end; i++)
+		clear_bit(i, var);
+}
+
+static char str_dup[PAGE_SIZE] __meminitdata;
+static int __init setup_thp_anon(char *str)
+{
+	char *token, *range, *policy, *subtoken;
+	char *start_size, *end_size;
+	int start, end;
+	char *p;
+
+	if (!str || strlen(str) + 1 > PAGE_SIZE)
+		goto err;
+	strcpy(str_dup, str);
+
+	p = str_dup;
+	while ((token = strsep(&p, ";")) != NULL) {
+		range = strsep(&token, ":");
+		policy = token;
+
+		if (!policy)
+			goto err;
+
+		while ((subtoken = strsep(&range, ",")) != NULL) {
+			if (strchr(subtoken, '-')) {
+				start_size = strsep(&subtoken, "-");
+				end_size = subtoken;
+
+				start = get_order_from_str(start_size);
+				end = get_order_from_str(end_size);
+			} else {
+				start = end = get_order_from_str(subtoken);
+			}
+
+			if (start < 0 || end < 0 || start > end)
+				goto err;
+
+			if (!strcmp(policy, "always")) {
+				set_bits(&huge_anon_orders_always, start, end);
+				clear_bits(&huge_anon_orders_inherit, start, end);
+				clear_bits(&huge_anon_orders_madvise, start, end);
+			} else if (!strcmp(policy, "madvise")) {
+				set_bits(&huge_anon_orders_madvise, start, end);
+				clear_bits(&huge_anon_orders_inherit, start, end);
+				clear_bits(&huge_anon_orders_always, start, end);
+			} else if (!strcmp(policy, "inherit")) {
+				set_bits(&huge_anon_orders_inherit, start, end);
+				clear_bits(&huge_anon_orders_madvise, start, end);
+				clear_bits(&huge_anon_orders_always, start, end);
+			} else if (!strcmp(policy, "never")) {
+				clear_bits(&huge_anon_orders_inherit, start, end);
+				clear_bits(&huge_anon_orders_madvise, start, end);
+				clear_bits(&huge_anon_orders_always, start, end);
+			} else {
+				goto err;
+			}
+		}
+	}
+
+	anon_orders_configured = true;
+	return 1;
+
+err:
+	pr_warn("thp_anon=%s: cannot parse(invalid size or policy), ignored\n", str);
+	return 0;
+}
+__setup("thp_anon=", setup_thp_anon);
+
 pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
 {
 	if (likely(vma->vm_flags & VM_WRITE))
-- 
2.34.1

Everything seems to be correct:

/ # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/enabled 
[always] inherit madvise never
/ # cat /sys/kernel/mm/transparent_hugepage/hugepages-32kB/enabled 
[always] inherit madvise never
/ # cat /sys/kernel/mm/transparent_hugepage/hugepages-64kB/enabled 
[always] inherit madvise never
/ # cat /sys/kernel/mm/transparent_hugepage/hugepages-128kB/enabled 
always [inherit] madvise never
/ # cat /sys/kernel/mm/transparent_hugepage/hugepages-256kB/enabled 
always inherit [madvise] never
/ # cat /sys/kernel/mm/transparent_hugepage/hugepages-512kB/enabled 
always [inherit] madvise never
/ # cat /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/enabled 
always inherit madvise [never]
/ # cat /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled 
always inherit madvise [never]

>
> >
> >>>>
> >>>> (one could also support ranges, like "16K-64K")
> >>>>
> >>>> Would be even better. Then, maybe only allow a single instance.
> >>>>
> >>>> Maybe consider it if it's not too crazy to parse ;)
> > I'll take a look. I'm going to be out for 3 weeks from end of Monday though, so
>
> Oh, lucky you! Enjoy!
>
> > probably won't get around to that until I'm back. I know Barry is keen to get
> > this merged, so Barry, if you'd like to take it over that's fine by me (I'm sure
> > you have enough on your plate though).
>
>
> --
> Cheers,
>
> David / dhildenb
>

Thanks
Barry


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ