[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240812053636.97700-1-21cnbao@gmail.com>
Date: Mon, 12 Aug 2024 17:36:36 +1200
From: Barry Song <21cnbao@...il.com>
To: david@...hat.com
Cc: akpm@...ux-foundation.org,
baohua@...nel.org,
baolin.wang@...ux.alibaba.com,
corbet@....net,
ioworker0@...il.com,
linux-kernel@...r.kernel.org,
linux-mm@...ck.org,
ryan.roberts@....com
Subject: Re: [PATCH v2] mm: Override mTHP "enabled" defaults at kernel cmdline
On Sat, Aug 10, 2024 at 1:15 AM David Hildenbrand <david@...hat.com> wrote:
>
> >>>>> -You can change the sysfs boot time defaults of Transparent Hugepage
> >>>>> -Support by passing the parameter ``transparent_hugepage=always`` or
> >>>>> -``transparent_hugepage=madvise`` or ``transparent_hugepage=never``
> >>>>> -to the kernel command line.
> >>>>> +You can change the sysfs boot time default for the top-level "enabled"
> >>>>> +control by passing the parameter ``transparent_hugepage=always`` or
> >>>>> +``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the
> >>>>> +kernel command line.
> >>>>> +
> >>>>> +Alternatively, each supported anonymous THP size can be controlled by
> >>>>> +passing ``thp_anon=<size>[KMG]:<state>``, where ``<size>`` is the THP size
> >>>>> +and ``<state>`` is one of ``always``, ``madvise``, ``never`` or
> >>>>> +``inherit``.
> >>>>> +
> >>>>> +For example, the following will set 64K THP to ``always``::
> >>>>> +
> >>>>> + thp_anon=64K:always
> >>>>> +
> >>>>> +``thp_anon=`` may be specified multiple times to configure all THP sizes as
> >>>>> +required. If ``thp_anon=`` is specified at least once, any anon THP sizes
> >>>>> +not explicitly configured on the command line are implicitly set to
> >>>>> +``never``.
> >>>>
> >>>> I suggest documenting that "thp_anon=" will not effect the value of
> >>>> "transparent_hugepage=", or any configured default.
> >
> > Did you see the previous conversation with Barry about whether or not to honour
> > configured defaults when any thp_anon= is provided [1]? Sounds like you also
> > think we should honour the PMD "inherit" default if not explicitly provided on
> > the command line? (see link for justification for the approach I'm currently
> > taking).
>
> I primarily think that we should document it :)
>
> What if someone passes "transparent_hugepage=always" and "thp_anon=..."?
> I would assume that transparent_hugepage would only affect the global
> toggle then?
>
> >
> > [1]
> > https://lore.kernel.org/linux-mm/CAGsJ_4x8ruPspuk_FQVggJMWcXLbRuZFq44gg-Dt7Ewt3ExqTw@mail.gmail.com/
> >
> >>>>
> >>>> Wondering if a syntax like
> >>>>
> >>>> thp_anon=16K,32K,64K:always;1048K,2048K:madvise
> >
> > Are there examples of that syntax already or have you just made it up? I found
> > examples with the colon (:) but nothing this fancy. I guess that's not a reason
> > not to do it though (other than the risk of screwing up the parser in a subtle way).
>
> I made it up -- mostly ;) I think we are quite flexible on what we can
> do. As always, maybe we can keep it bit consistent with existing stuff.
>
> For hugetlb_cma we have things like
> "<node>:nn[KMGTPE|[,<node>:nn[KMGTPE]]
>
> "memmap=" options are more ... advanced, including memory ranges. There
> are a bunch more documented in kernel-parameters.txt that have more
> elaborate formats.
>
> Ranges would probably be the most valuable addition. So maybe we should
> start with:
>
> thp_anon=16K-64K:always,1048K-2048K:madvise
>
> So to enable all THPs it would simply be
>
> thp_anon=16K-2M:always
>
> Interesting question what would happen if someone passes:
>
> thp_anon=8K-2M:always
>
> Likely we simply would apply it to any size in the range, even if
> start/end is not a THP size.
>
> But we would want to complain to the user if someone only specifies a
> single one (or a range does not even select a single one) that does not
> exist:
>
> thp_anon=8K:always
How about rejecting settings with any illegal size or policy?
I found there are too many corner cases to say "yes" or "no". It seems
the code could be much cleaner if we simply reject illegal settings.
On the other hand, we should rely on smart users to provide correct
settings, shouldn’t we? While working one the code, I felt that
extracting partial correct settings from incorrect ones and supporting
them might be a bit of over-design.
I have tested the below code by
"thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never"
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1a12c011e2df..6a1f54cff7f9 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -81,6 +81,7 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL;
unsigned long huge_anon_orders_always __read_mostly;
unsigned long huge_anon_orders_madvise __read_mostly;
unsigned long huge_anon_orders_inherit __read_mostly;
+static bool anon_orders_configured;
unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
unsigned long vm_flags,
@@ -737,7 +738,10 @@ static int __init hugepage_init_sysfs(struct kobject **hugepage_kobj)
* disable all other sizes. powerpc's PMD_ORDER isn't a compile-time
* constant so we have to do this here.
*/
- huge_anon_orders_inherit = BIT(PMD_ORDER);
+ if (!anon_orders_configured) {
+ huge_anon_orders_inherit = BIT(PMD_ORDER);
+ anon_orders_configured = true;
+ }
*hugepage_kobj = kobject_create_and_add("transparent_hugepage", mm_kobj);
if (unlikely(!*hugepage_kobj)) {
@@ -922,6 +926,103 @@ static int __init setup_transparent_hugepage(char *str)
}
__setup("transparent_hugepage=", setup_transparent_hugepage);
+static inline int get_order_from_str(const char *size_str)
+{
+ unsigned long size;
+ char *endptr;
+ int order;
+
+ size = memparse(size_str, &endptr);
+ order = fls(size >> PAGE_SHIFT) - 1;
+ if (order & ~THP_ORDERS_ALL_ANON) {
+ pr_err("invalid size in thp_anon= %ld\n", size);
+ return -EINVAL;
+ }
+
+ return order;
+}
+
+static inline void set_bits(unsigned long *var, int start, int end)
+{
+ int i;
+
+ for (i = start; i <= end; i++)
+ set_bit(i, var);
+}
+
+static inline void clear_bits(unsigned long *var, int start, int end)
+{
+ int i;
+
+ for (i = start; i <= end; i++)
+ clear_bit(i, var);
+}
+
+static char str_dup[PAGE_SIZE] __meminitdata;
+static int __init setup_thp_anon(char *str)
+{
+ char *token, *range, *policy, *subtoken;
+ char *start_size, *end_size;
+ int start, end;
+ char *p;
+
+ if (!str || strlen(str) + 1 > PAGE_SIZE)
+ goto err;
+ strcpy(str_dup, str);
+
+ p = str_dup;
+ while ((token = strsep(&p, ";")) != NULL) {
+ range = strsep(&token, ":");
+ policy = token;
+
+ if (!policy)
+ goto err;
+
+ while ((subtoken = strsep(&range, ",")) != NULL) {
+ if (strchr(subtoken, '-')) {
+ start_size = strsep(&subtoken, "-");
+ end_size = subtoken;
+
+ start = get_order_from_str(start_size);
+ end = get_order_from_str(end_size);
+ } else {
+ start = end = get_order_from_str(subtoken);
+ }
+
+ if (start < 0 || end < 0 || start > end)
+ goto err;
+
+ if (!strcmp(policy, "always")) {
+ set_bits(&huge_anon_orders_always, start, end);
+ clear_bits(&huge_anon_orders_inherit, start, end);
+ clear_bits(&huge_anon_orders_madvise, start, end);
+ } else if (!strcmp(policy, "madvise")) {
+ set_bits(&huge_anon_orders_madvise, start, end);
+ clear_bits(&huge_anon_orders_inherit, start, end);
+ clear_bits(&huge_anon_orders_always, start, end);
+ } else if (!strcmp(policy, "inherit")) {
+ set_bits(&huge_anon_orders_inherit, start, end);
+ clear_bits(&huge_anon_orders_madvise, start, end);
+ clear_bits(&huge_anon_orders_always, start, end);
+ } else if (!strcmp(policy, "never")) {
+ clear_bits(&huge_anon_orders_inherit, start, end);
+ clear_bits(&huge_anon_orders_madvise, start, end);
+ clear_bits(&huge_anon_orders_always, start, end);
+ } else {
+ goto err;
+ }
+ }
+ }
+
+ anon_orders_configured = true;
+ return 1;
+
+err:
+ pr_warn("thp_anon=%s: cannot parse(invalid size or policy), ignored\n", str);
+ return 0;
+}
+__setup("thp_anon=", setup_thp_anon);
+
pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
{
if (likely(vma->vm_flags & VM_WRITE))
--
2.34.1
Everything seems to be correct:
/ # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/enabled
[always] inherit madvise never
/ # cat /sys/kernel/mm/transparent_hugepage/hugepages-32kB/enabled
[always] inherit madvise never
/ # cat /sys/kernel/mm/transparent_hugepage/hugepages-64kB/enabled
[always] inherit madvise never
/ # cat /sys/kernel/mm/transparent_hugepage/hugepages-128kB/enabled
always [inherit] madvise never
/ # cat /sys/kernel/mm/transparent_hugepage/hugepages-256kB/enabled
always inherit [madvise] never
/ # cat /sys/kernel/mm/transparent_hugepage/hugepages-512kB/enabled
always [inherit] madvise never
/ # cat /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/enabled
always inherit madvise [never]
/ # cat /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
always inherit madvise [never]
>
> >
> >>>>
> >>>> (one could also support ranges, like "16K-64K")
> >>>>
> >>>> Would be even better. Then, maybe only allow a single instance.
> >>>>
> >>>> Maybe consider it if it's not too crazy to parse ;)
> > I'll take a look. I'm going to be out for 3 weeks from end of Monday though, so
>
> Oh, lucky you! Enjoy!
>
> > probably won't get around to that until I'm back. I know Barry is keen to get
> > this merged, so Barry, if you'd like to take it over that's fine by me (I'm sure
> > you have enough on your plate though).
>
>
> --
> Cheers,
>
> David / dhildenb
>
Thanks
Barry
Powered by blists - more mailing lists