[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231002144035.00000b36@Huawei.com>
Date: Mon, 2 Oct 2023 14:40:35 +0100
From: Jonathan Cameron <Jonathan.Cameron@...wei.com>
To: Gregory Price <gourry.memverge@...il.com>
CC: <linux-mm@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<linux-arch@...r.kernel.org>, <linux-api@...r.kernel.org>,
<linux-cxl@...r.kernel.org>, <luto@...nel.org>,
<tglx@...utronix.de>, <mingo@...hat.com>, <bp@...en8.de>,
<dave.hansen@...ux.intel.com>, <hpa@...or.com>, <arnd@...db.de>,
<akpm@...ux-foundation.org>, <x86@...nel.org>,
Gregory Price <gregory.price@...verge.com>
Subject: Re: [RFC PATCH 3/3] mm/mempolicy: implement a partial-interleave
mempolicy
On Thu, 14 Sep 2023 19:54:57 -0400
Gregory Price <gourry.memverge@...il.com> wrote:
> The partial-interleave mempolicy implements interleave on an
I'm not sure 'partial' really conveys what is going on here.
Weighted, or uneven-interleave maybe?
> allocation interval. The default node is the local node, for
> which N pages will be allocated before an interleave pass occurs.
>
> For example:
> nodes=0,1,2
> interval=3
> cpunode=0
>
> Over 10 consecutive allocations, the following nodes will be selected:
> [0,0,0,1,2,0,0,0,1,2]
>
> In this example, there is a 60%/20%/20% distribution of memory.
>
> Using this mechanism, it becomes possible to define an approximate
> distribution percentage of memory across a set of nodes:
>
> local_node% : interval/((nr_nodes-1)+interval-1)
> other_node% : (1-local_node%)/(nr_nodes-1)
I'd like to see more discussion here of why you would do this...
A few trivial bits inline,
Jonathan
...
> +static unsigned long alloc_pages_bulk_array_partial_interleave(gfp_t gfp,
> + struct mempolicy *pol, unsigned long nr_pages,
> + struct page **page_array)
> +{
> + nodemask_t nodemask = pol->nodes;
> + unsigned long nr_pages_main;
> + unsigned long nr_pages_other;
> + unsigned long total_cycle;
> + unsigned long delta;
> + unsigned long interval;
> + int allocated = 0;
> + int start_nid;
> + int nnodes;
> + int prev, next;
> + int i;
> +
> + /* This stabilizes nodes on the stack incase pol->nodes changes */
> + barrier();
> +
> + nnodes = nodes_weight(nodemask);
> + start_nid = numa_node_id();
> +
> + if (!node_isset(start_nid, nodemask))
> + start_nid = first_node(nodemask);
> +
> + if (nnodes == 1) {
> + allocated = __alloc_pages_bulk(gfp, start_nid,
> + NULL, nr_pages_main,
> + NULL, page_array);
> + return allocated;
return __alloc_pages_bulk(...)
> + }
> + /* We don't want to double-count the main node in calculations */
> + nnodes--;
> +
> + interval = pol->part_int.interval;
> + total_cycle = (interval + nnodes);
excess brackets. Same in various other places.
> + /* Number of pages on main node: (cycles*interval + up to interval) */
> + nr_pages_main = ((nr_pages / total_cycle) * interval);
> + nr_pages_main += (nr_pages % total_cycle % (interval + 1));
> + /* Number of pages on others: (remaining/nodes) + 1 page if delta */
> + nr_pages_other = (nr_pages - nr_pages_main) / nnodes;
> + nr_pages_other /= nnodes;
> + /* Delta is number of pages beyond interval up to full cycle */
> + delta = nr_pages - (nr_pages_main + (nr_pages_other * nnodes));
> +
> + /* start by allocating for the main node, then interleave rest */
> + prev = start_nid;
> + allocated = __alloc_pages_bulk(gfp, start_nid, NULL, nr_pages_main,
> + NULL, page_array);
> + for (i = 0; i < nnodes; i++) {
> + int pages = nr_pages_other + (delta-- ? 1 : 0);
> +
> + next = next_node_in(prev, nodemask);
> + if (next < MAX_NUMNODES)
> + prev = next;
> + allocated += __alloc_pages_bulk(gfp, next, NULL, pages,
> + NULL, page_array);
> + }
> +
> + return allocated;
> +}
> +
Powered by blists - more mailing lists