linux-kernel - Re: [PATCH 3/3] mm/numa_balancing:Allow migrate on protnone reference with MPOL_PREFERRED

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0e633718-2313-4a0f-9907-b0fa5ffa18bc@linux.ibm.com>
Date: Mon, 26 Feb 2024 18:39:16 +0530
From: Donet Tom <donettom@...ux.ibm.com>
To: Michal Hocko <mhocko@...e.com>,
        "Aneesh Kumar K.V"
 <aneesh.kumar@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Huang Ying <ying.huang@...el.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Mel Gorman <mgorman@...e.de>, Ben Widawsky <ben.widawsky@...el.com>,
        Feng Tang <feng.tang@...el.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Peter Zijlstra
 <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
        Rik van Riel <riel@...riel.com>, Johannes Weiner <hannes@...xchg.org>,
        Matthew Wilcox <willy@...radead.org>,
        Mike Kravetz
 <mike.kravetz@...cle.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Dan Williams <dan.j.williams@...el.com>,
        Hugh Dickins <hughd@...gle.com>,
        Kefeng Wang <wangkefeng.wang@...wei.com>,
        Suren Baghdasaryan <surenb@...gle.com>
Subject: Re: [PATCH 3/3] mm/numa_balancing:Allow migrate on protnone reference
 with MPOL_PREFERRED_MANY policy


On 2/20/24 14:18, Michal Hocko wrote:
> On Tue 20-02-24 09:27:25, Aneesh Kumar K.V wrote:
> [...]
>> 	case MPOL_PREFERRED_MANY:
>> 		if (pol->flags & MPOL_F_MORON) {
>> 			if (!mpol_preferred_should_numa_migrate(thisnid, curnid, pol))
>> 				goto out;
>> 			break;
>> 		}
>>
>> 		/*
>> 		 * use current page if in policy nodemask,
>> 		 * else select nearest allowed node, if any.
>> 		 * If no allowed nodes, use current [!misplaced].
>> 		 */
>> 		if (node_isset(curnid, pol->nodes))
>> 			goto out;
>> 		z = first_zones_zonelist(
>> 				node_zonelist(thisnid, GFP_HIGHUSER),
>> 				gfp_zone(GFP_HIGHUSER),
>> 				&pol->nodes);
>> 		polnid = zone_to_nid(z->zone);
>> 		break;
>>   ....
>> ..
>>         }
>>
>> 	/* Migrate the folio towards the node whose CPU is referencing it */
>> 	if (pol->flags & MPOL_F_MORON) {
>> 		polnid = thisnid;
>>
>> 		if (!should_numa_migrate_memory(current, folio, curnid,
>> 						thiscpu))
>> 			goto out;
>> 	}
>>
>> 	if (curnid != polnid)
>> 		ret = polnid;
>> out:
>> 	mpol_cond_put(pol);
>>
>> 	return ret;
>> }
> Ohh, right this code is confusing as hell. Thanks for the clarification.
> With this in mind. There should be a comment warning about MPOL_F_MOF
> always being unset as the userspace cannot really set it up.
>
> Thanks!
>
Hi Michal

Sorry For the late reply.
If we set  MPOL_F_NUMA_BALANCING from userspace then MPOL_F_MOF and MPOL_F_MORON flags will get set in kernel.

/* Basic parameter sanity check used by both mbind() and set_mempolicy() */
static inline int sanitize_mpol_flags(int *mode, unsigned short *flags)
{
     *flags = *mode & MPOL_MODE_FLAGS;
     *mode &= ~MPOL_MODE_FLAGS;

     if ((unsigned int)(*mode) >=  MPOL_MAX)
         return -EINVAL;

     if ((*flags & MPOL_F_STATIC_NODES) && (*flags & MPOL_F_RELATIVE_NODES))
         return -EINVAL;

     if (*flags & MPOL_F_NUMA_BALANCING) {
         if (*mode == MPOL_BIND || *mode == MPOL_PREFERRED_MANY)
             *flags |= (MPOL_F_MOF | MPOL_F_MORON);
         else
             return -EINVAL;
}

In current kernel it is supported only for MPOL_BIND and we added suppor for MPOL_PREFERRED_MANY also.

Why MPOL_F_MOF  flag is required?
---------------------------------
For NUMA migration the process memory is unmapped by "task_numa_work" periodically, if unmapped memory got
accessed again then NUMA hinting page fault will occur and in page fault handler the pages get migrated.

If MPOL_F_MOF is not set then "task_numa_work" will not unmap the process pages and NUMA hinting page fault
and migration will not occur. This change has been introduced by commit
fc3147245d193b (mm: numa: Limit NUMA scanning to migrate-on-fault VMAs).

How new implementation works
----------------------------
MPOL_PREFERRED_MANY is able to set  MPOL_F_MOF and MPOL_F_MORON through MPOL_F_NUMA_BALANCING. So NUMA hinting
page faults will occur. In mpol_misplaced if we can do numa migration, we select the currently executing node as the target node
otherwise we end up returning from the function with ret = NUMA_NO_NODE.

So since we are able to set MPOL_F_MOF from userspace through MPOL_F_NUMA_BALANCING, no need to add this comment right?

Thanks
Donet Tom