linux-kernel - Re: [RFC V2 12/12] mm: Tag VMA with VM_CDM flag explicitly during mbind(MPOL

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <1808730857.1637296.1486566279643.JavaMail.zimbra@redhat.com>
Date:   Wed, 8 Feb 2017 10:04:39 -0500 (EST)
From:   Jerome Glisse <jglisse@...hat.com>
To:     Dave Hansen <dave.hansen@...el.com>
Cc:     Anshuman Khandual <khandual@...ux.vnet.ibm.com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org, mhocko@...e.com,
        vbabka@...e.cz, mgorman@...e.de, minchan@...nel.org,
        aneesh kumar <aneesh.kumar@...ux.vnet.ibm.com>,
        bsingharora@...il.com, srikar@...ux.vnet.ibm.com,
        haren@...ux.vnet.ibm.com, dan j williams <dan.j.williams@...el.com>
Subject: Re: [RFC V2 12/12] mm: Tag VMA with VM_CDM flag explicitly during
 mbind(MPOL_BIND)

> On 01/30/2017 08:36 PM, Anshuman Khandual wrote:
> > On 01/30/2017 11:24 PM, Dave Hansen wrote:
> >> On 01/29/2017 07:35 PM, Anshuman Khandual wrote:
> >>> +		if ((new_pol->mode == MPOL_BIND)
> >>> +			&& nodemask_has_cdm(new_pol->v.nodes))
> >>> +			set_vm_cdm(vma);
> >> So, if you did:
> >>
> >> 	mbind(addr, PAGE_SIZE, MPOL_BIND, all_nodes, ...);
> >> 	mbind(addr, PAGE_SIZE, MPOL_BIND, one_non_cdm_node, ...);
> >>
> >> You end up with a VMA that can never have KSM done on it, etc...  Even
> >> though there's no good reason for it.  I guess /proc/$pid/smaps might be
> >> able to help us figure out what was going on here, but that still seems
> >> like an awful lot of damage.
> > 
> > Agreed, this VMA should not remain tagged after the second call. It does
> > not make sense. For this kind of scenarios we can re-evaluate the VMA
> > tag every time the nodemask change is attempted. But if we are looking for
> > some runtime re-evaluation then we need to steal some cycles are during
> > general VMA processing opportunity points like merging and split to do
> > the necessary re-evaluation. Should do we do these kind two kinds of
> > re-evaluation to be more optimal ?
> 
> I'm still unconvinced that you *need* detection like this.  Scanning big
> VMAs is going to be really painful.
> 
> I thought I asked before but I can't find it in this thread.  But, we
> have explicit interfaces for disabling KSM and khugepaged.  Why do we
> need implicit ones like this in addition to those?
> 

I said it in other part of the thread i think the vma flag is a no go. Because
it try to set something that is orthogonal to vma. That you want some vma to
use device memory on new allocation is a valid policy for a vma to have. But to
have a flag that say various kernel subsystem hey my memory is special skip me
is wrong.

The fact that you want to exclude device memory from KSM or autonuma is valid but
it should be done at struct page level ie KSM or autonuma should check the type
of page before doing anything. For CDM pages they would skip. It could be the flags
idea that was discussed.

The overhead of doing it at page level is far lower than trying to manage a vma
flags with all the issue related to vma merging, splitting and lifetime of such
flags. Moreover this flags is an all or nothing, it does not consider the case
where you have as much regular page as CDM page in a vma. It would block regular
page from under going the usual KSM/autonuma ...

I do strongly believe that this vma flag is a bad idea.

Cheers,
Jérôme