linux-kernel - Re: [PATCH 4/8] vrange: Clear volatility on new mmaps

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130614002132.GC4533@bbox>
Date:	Fri, 14 Jun 2013 09:21:32 +0900
From:	Minchan Kim <minchan@...nel.org>
To:	John Stultz <john.stultz@...aro.org>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Android Kernel Team <kernel-team@...roid.com>,
	Robert Love <rlove@...gle.com>, Mel Gorman <mel@....ul.ie>,
	Hugh Dickins <hughd@...gle.com>,
	Dave Hansen <dave@...ux.vnet.ibm.com>,
	Rik van Riel <riel@...hat.com>,
	Dmitry Adamushko <dmitry.adamushko@...il.com>,
	Dave Chinner <david@...morbit.com>, Neil Brown <neilb@...e.de>,
	Andrea Righi <andrea@...terlinux.com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
	Mike Hommey <mh@...ndium.org>, Taras Glek <tglek@...illa.com>,
	Dhaval Giani <dgiani@...illa.com>, Jan Kara <jack@...e.cz>,
	KOSAKI Motohiro <kosaki.motohiro@...il.com>,
	Michel Lespinasse <walken@...gle.com>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [PATCH 4/8] vrange: Clear volatility on new mmaps

Hello John,

On Thu, Jun 13, 2013 at 04:43:58PM -0700, John Stultz wrote:
> On 06/12/2013 11:28 PM, Minchan Kim wrote:
> >Hey John,
> >
> >On Tue, Jun 11, 2013 at 09:22:47PM -0700, John Stultz wrote:
> >>At lsf-mm, the issue was brought up that there is a precedence with
> >>interfaces like mlock, such that new mappings in a pre-existing range
> >>do no inherit the mlock state.
> >>
> >>This is mostly because mlock only modifies the existing vmas, and so
> >>any new mmaps create new vmas, which won't be mlocked.
> >>
> >>Since volatility is not stored in the vma (for good cause, specfically
> >>as we'd have to have manage file volatility differently from anonymous
> >>and we're likely to manage volatility on small chunks of memory, which
> >>would cause lots of vma splitting and churn), this patch clears volatilty
> >>on new mappings, to ensure that we don't inherit volatility if memory in
> >>an existing volatile range is unmapped and then re-mapped with something
> >>else.
> >>
> >>Thus, this patch forces any volatility to be cleared on mmap.
> >If we have lots of node on vroot but it doesn't include newly mmmaping
> >vma range, it's purely unnecessary cost and that's never what we want.
> >
> >>XXX: We expect this patch to be not well loved by mm folks, and are open
> >>to alternative methods here. Its more of a place holder to address
> >>the issue from lsf-mm and hopefully will spur some further discussion.
> >Another idea is we can add "bool is_vrange" in struct vm_area_struct.
> >It is protected by vrange_lock. The scenario is following as,
> >
> >When do_vrange is called with VRANGE_VOLATILE, it iterates vmas
> >and mark the vma->is_vrange to true. So, we can avoid tree traversal
> >if the is_vrange is false when munmap is called and newly mmaped vma
> >doesn't need to clear any volatility.
> 
> We could look further into this approach if folks think its the best
> way to go. Though it has the downside of having the split the vmas
> when we're dealing with a large number of smallish objects. Also

We don't need to split vma, which I don't really want.
I meant followig as

1)

0x100000                                        0x10000000
|                       VMA : isvrange = false  |


2) vrange(0x200000, 0x100000, VRANGE_VOLATILE)


0x100000                                        0x10000000
|                       VMA : isvrange = true   |


        vroot
       /
   node 1
    
2) vrange(0x400000, 0x100000, VRANGE_VOLATILE)

0x100000                                        0x10000000
|                       VMA : isvrange = true   |


        vroot
       /     \
   node 1  node 2


3) unmap(0x400000, 0x100000, VRANGE_NOVOLATILE)

sys_munmap:

if (vma->is_vrange) {
        vrange_clear(0x400000, 0x400000 + 0x100000 -1); 
        if (vma_vrange_all_clear(vma)
                vma->isvrange = false;
}

0x100000                                        0x10000000
|                       VMA : isvrange = true   |

        vroot
       /    
   node 1


3) unmap(0x200000, 0x100000, VRANGE_NOVOLATILE)

sys_munmap:

if (vma->is_vrange) {
        vrange_clear(0x200000, 0x200000 + 0x100000 -1); 
        if (vma_vrange_all_clear(vma)
                vma->isvrange = false;
}

0x100000                                        0x10000000
|                       VMA : isvrange = false  |

        vroot
       /    \


4) purging path

bool really_vrange_page(page *page)
{
        
        return __vrange_address(vroot, startofpage, endofpage);
}

shrink_page_list
        ..
        ..

        vma = rmap_from_page(page);
        if (vma->is_vrange) {
                /*
                 * vma's is_vrange could have false positive
                 * so that we should check it.
                 */
                if (really_vrange_page(page))
                        purge_page(page);
        }
        ..
        ..

So we can reduce unnecessary vroot traverse without vma splitting.

> we'd be increasing the vma_struct size for everyone, even if no one
> is using volatile ranges, which may be a bigger concern.


I think vma is not a sensitive about size and historically, we have
been added a variable easily. Of course, another ideas which don't
need to increase vma size are welcome but IMHO, it'a good compromise
between performance and memoryfootprint.

> 
> Also it means we'd be managing anonymous and file volatility with
> different structures (though that's not the end of the world).

volatility still is kept in vrange->purged.
Do I miss something?

> 
> thanks
> -john
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/