linux-kernel - Re: [RFC V2 12/12] mm: Tag VMA with VM_CDM flag explicitly during mbind(MPOL

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <941c2b32-7bd8-56e7-a8d5-c103cab121d1@linux.vnet.ibm.com>
Date:   Wed, 8 Feb 2017 19:43:54 +0530
From:   Anshuman Khandual <khandual@...ux.vnet.ibm.com>
To:     Dave Hansen <dave.hansen@...el.com>,
        Anshuman Khandual <khandual@...ux.vnet.ibm.com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Cc:     mhocko@...e.com, vbabka@...e.cz, mgorman@...e.de,
        minchan@...nel.org, aneesh.kumar@...ux.vnet.ibm.com,
        bsingharora@...il.com, srikar@...ux.vnet.ibm.com,
        haren@...ux.vnet.ibm.com, jglisse@...hat.com,
        dan.j.williams@...el.com
Subject: Re: [RFC V2 12/12] mm: Tag VMA with VM_CDM flag explicitly during
 mbind(MPOL_BIND)

On 02/07/2017 11:37 PM, Dave Hansen wrote:
>> On 01/30/2017 11:24 PM, Dave Hansen wrote:
>>> On 01/29/2017 07:35 PM, Anshuman Khandual wrote:
>>>> +		if ((new_pol->mode == MPOL_BIND)
>>>> +			&& nodemask_has_cdm(new_pol->v.nodes))
>>>> +			set_vm_cdm(vma);
>>> So, if you did:
>>>
>>> 	mbind(addr, PAGE_SIZE, MPOL_BIND, all_nodes, ...);
>>> 	mbind(addr, PAGE_SIZE, MPOL_BIND, one_non_cdm_node, ...);
>>>
>>> You end up with a VMA that can never have KSM done on it, etc...  Even
>>> though there's no good reason for it.  I guess /proc/$pid/smaps might be
>>> able to help us figure out what was going on here, but that still seems
>>> like an awful lot of damage.
>> Agreed, this VMA should not remain tagged after the second call. It does
>> not make sense. For this kind of scenarios we can re-evaluate the VMA
>> tag every time the nodemask change is attempted. But if we are looking for
>> some runtime re-evaluation then we need to steal some cycles are during
>> general VMA processing opportunity points like merging and split to do
>> the necessary re-evaluation. Should do we do these kind two kinds of
>> re-evaluation to be more optimal ?
> I'm still unconvinced that you *need* detection like this.  Scanning big
> VMAs is going to be really painful.
> 
> I thought I asked before but I can't find it in this thread.  But, we
> have explicit interfaces for disabling KSM and khugepaged.  Why do we
> need implicit ones like this in addition to those?

Missed the discussion we had on this last time around I think. My bad, sorry
about that. IIUC we can disable KSM through madvise() call, in fact I guess
its disabled by default and need to be enabled. We can just have a similar
interface to disable auto NUMA for a specific VMA or we can handle it page
by page basis with something like this.

diff --git a/mm/memory.c b/mm/memory.c
index 1099d35..101dfd9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3518,6 +3518,9 @@ static int do_numa_page(struct vm_fault *vmf)
                goto out;
        }
 
+       if (is_cdm_node(page_to_nid(page)))
+               goto out;
+
        /* Migrate to the requested node */
        migrated = migrate_misplaced_page(page, vma, target_nid);
        if (migrated) {

I am still looking into these aspects. BTW have posted the minimum set of
CDM patches which defines and isolates CDM node.