linux-kernel - Re: [PATCH v4 10/10] thp: implement refcounting for huge zero page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20121024194552.GA24460@otc-wbsnb-06>
Date:	Wed, 24 Oct 2012 22:45:52 +0300
From:	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	"Kirill A. Shutemov" <kirill@...temov.name>,
	Andrea Arcangeli <aarcange@...hat.com>, linux-mm@...ck.org,
	Andi Kleen <ak@...ux.intel.com>,
	"H. Peter Anvin" <hpa@...ux.intel.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 10/10] thp: implement refcounting for huge zero page

On Wed, Oct 24, 2012 at 12:22:53PM -0700, Andrew Morton wrote:
> On Wed, 24 Oct 2012 02:38:01 +0300
> "Kirill A. Shutemov" <kirill@...temov.name> wrote:
> 
> > On Tue, Oct 23, 2012 at 03:59:15PM -0700, Andrew Morton wrote:
> > > On Tue, 23 Oct 2012 10:00:18 +0300
> > > "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com> wrote:
> > > 
> > > > > Well, how hard is it to trigger the bad behavior?  One can easily
> > > > > create a situation in which that page's refcount frequently switches
> > > > > from 0 to 1 and back again.  And one can easily create a situation in
> > > > > which the shrinkers are being called frequently.  Run both at the same
> > > > > time and what happens?
> > > > 
> > > > If the goal is to trigger bad behavior then:
> > > > 
> > > > 1. read from an area where a huge page can be mapped to get huge zero page
> > > >    mapped. hzp is allocated here. refcounter == 2.
> > > > 2. write to the same page. refcounter == 1.
> > > > 3. echo 3 > /proc/sys/vm/drop_caches. refcounter == 0 -> free the hzp.
> > > > 4. goto 1.
> > > > 
> > > > But it's unrealistic. /proc/sys/vm/drop_caches is only root-accessible.
> > > 
> > > Yes, drop_caches is uninteresting.
> > > 
> > > > We can trigger shrinker only under memory pressure. But in this, most
> > > > likely we will get -ENOMEM on hzp allocation and will go to fallback path
> > > > (4k zero page).
> > > 
> > > I disagree.  If, for example, there is a large amount of clean
> > > pagecache being generated then the shrinkers will be called frequently
> > > and memory reclaim will be running at a 100% success rate.  The
> > > hugepage allocation will be successful in such a situation?
> > 
> > Yes.
> > 
> > Shrinker callbacks are called from shrink_slab() which happens after page
> > cache reclaim, so on next reclaim round page cache will reclaim first and
> > we will avoid frequent alloc-free pattern.
> 
> I don't understand this.  If reclaim is running continuously (which can
> happen pretty easily: "dd if=/fast-disk/large-file") then the zero page
> will be whipped away very shortly after its refcount has fallen to
> zero.
> 
> > One more thing we can do: increase shrinker->seeks to something like
> > DEFAULT_SEEKS * 4. In this case shrink_slab() will call our callback after
> > callbacks with DEFAULT_SEEKS.
> 
> It would be useful if you could try to make this scenario happen.  If
> for some reason it doesn't happen then let's understand *why* it
> doesn't happen.
> 
> I'm thinking that such a workload would be the above dd in parallel
> with a small app which touches the huge page and then exits, then gets
> executed again.  That "small app" sounds realistic to me.  Obviously
> one could exercise the zero page's refcount at higher frequency with a
> tight map/touch/unmap loop, but that sounds less realistic.  It's worth
> trying that exercise as well though.
> 
> Or do something else.  But we should try to probe this code's
> worst-case behaviour, get an understanding of its effects and then
> decide whether any such workload is realisic enough to worry about.

Okay, I'll try few memory pressure scenarios.

Meanwhile, could you take patches 01-09? Patch 09 implements simpler
allocation scheme. It would be nice to get all other code tested.
Or do you see any other blocker?

-- 
 Kirill A. Shutemov

Download attachment "signature.asc" of type "application/pgp-signature" (837 bytes)