[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <047cdda5-d6bd-4ce0-8905-f1bc00153d6d@suse.cz>
Date: Fri, 28 Feb 2025 12:01:56 +0100
From: Vlastimil Babka <vbabka@...e.cz>
To: zhongjinji@...or.com, akpm@...ux-foundation.org,
Mel Gorman <mgorman@...hsingularity.net>,
Johannes Weiner <hannes@...xchg.org>, Brendan Jackman <jackmanb@...gle.com>
Cc: yuzhao@...gle.com, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
rientjes@...gle.com, yipengxiang@...or.com, liulu.liu@...or.com,
feng.han@...or.com
Subject: Re: [PATCH] mm/page_alloc: make the maximum number of highatomic
pageblocks resizable
+Cc Mel, Johannes and Brendan, please keep on future submissions
On 2/26/25 03:41, zhongjinji@...or.com wrote:
> From: zhongjinji <zhongjinji@...or.com>
>
> In the past, nr_reserved_highatomic was considered to be part of
> unusable_free when there was no ALLOC_RESERVES flag. To prevent
> unusable_free from being too large, it is reasonable to set a
> fixed maximum highatomic value.
>
> Even if the maximum number of highatomic pageblocks is set to be larger,
> unusable_free may not increase, since Yu Zhao provided the modification
> about nr_free_highatomic in
> https://lore.kernel.org/all/20241028182653.3420139-1-yuzhao@google.com/T/#u
I think I raised in discussing that patch the fact that
unreserve_highatomic_pageblock() is flawed in that it will only find a
highatomic block to unreserve if it has some free pages as it searches via
freelist. So if the existing reserve becomes fully used up, it's dead
weight, unless the highatomic allocations are shortlived. It could make
sense to remove the HIGHATOMIC marking so other, more free blocks might be
reserved instead.
I fear if this approach isn't just working around that limitation.
> More highatomic pageblocks are beneficial for the successful allocation
> of high-order page, which is helpful in some devices. Therefore, use
Can you elaborate what is it helpful with? And what kind of values you
expect to be using? Do you find that your existing HIGHATOMIC blocks are
fully depleted as I speculate above? Are your highatomic allocations short
or long lived?
Maybe simply at this point nr_free_highatomic would be a better metric than
nr_reserved_highatomic to decide if we should reserve another pageblock? The
counter didn't exist when highatomic reserves were introduced.
But maybe also some mechanism (compaction maybe?) could also be looking for
highatomic blocks that are fully used and unreserve them?
> highatomic_reserve_ratio to adjust the maximum number of highatomic
> pageblocks.
>
> Signed-off-by: zhongjinji <zhongjinji@...or.com>
Minimally this needs to be documented in e.g.
Documentation/admin-guide/sysctl/vm.rst, it's not obvious that the value is
divided by 1000 and not 100 for example.
> ---
> mm/page_alloc.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 579789600a3c..dbdce6a0f694 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -273,6 +273,7 @@ int min_free_kbytes = 1024;
> int user_min_free_kbytes = -1;
> static int watermark_boost_factor __read_mostly = 15000;
> static int watermark_scale_factor = 10;
> +static int highatomic_reserve_ratio = 10;
>
> /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
> int movable_zone;
> @@ -2046,7 +2047,8 @@ static void reserve_highatomic_pageblock(struct page *page, int order,
> */
> if ((zone_managed_pages(zone) / 100) < pageblock_nr_pages)
> return;
> - max_managed = ALIGN((zone_managed_pages(zone) / 100), pageblock_nr_pages);
> + max_managed = ALIGN((zone_managed_pages(zone) * highatomic_reserve_ratio / 1000),
> + pageblock_nr_pages);
> if (zone->nr_reserved_highatomic >= max_managed)
> return;
>
> @@ -6199,6 +6201,13 @@ static const struct ctl_table page_alloc_sysctl_table[] = {
> .mode = 0644,
> .proc_handler = percpu_pagelist_high_fraction_sysctl_handler,
> .extra1 = SYSCTL_ZERO,
> + },
> + .procname = "highatomic_reserve_ratio",
> + .data = &highatomic_reserve_ratio,
> + .maxlen = sizeof(highatomic_reserve_ratio),
> + .mode = 0644,
> + .proc_handler = proc_dointvec_minmax,
> + .extra1 = SYSCTL_ZERO,
> },
> {
> .procname = "lowmem_reserve_ratio",
Powered by blists - more mailing lists