linux-kernel - Re: [PATCH v4 10/10] thp: implement refcounting for huge zero page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20121025212251.GA31749@shutemov.name>
Date:	Fri, 26 Oct 2012 00:22:51 +0300
From:	"Kirill A. Shutemov" <kirill@...temov.name>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	Andrea Arcangeli <aarcange@...hat.com>, linux-mm@...ck.org,
	Andi Kleen <ak@...ux.intel.com>,
	"H. Peter Anvin" <hpa@...ux.intel.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 10/10] thp: implement refcounting for huge zero page

On Thu, Oct 25, 2012 at 02:05:24PM -0700, Andrew Morton wrote:
> On Thu, 25 Oct 2012 23:49:59 +0300
> "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com> wrote:
> 
> > On Wed, Oct 24, 2012 at 01:25:52PM -0700, Andrew Morton wrote:
> > > On Wed, 24 Oct 2012 22:45:52 +0300
> > > "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com> wrote:
> > > 
> > > > On Wed, Oct 24, 2012 at 12:22:53PM -0700, Andrew Morton wrote:
> > > > > 
> > > > > I'm thinking that such a workload would be the above dd in parallel
> > > > > with a small app which touches the huge page and then exits, then gets
> > > > > executed again.  That "small app" sounds realistic to me.  Obviously
> > > > > one could exercise the zero page's refcount at higher frequency with a
> > > > > tight map/touch/unmap loop, but that sounds less realistic.  It's worth
> > > > > trying that exercise as well though.
> > > > > 
> > > > > Or do something else.  But we should try to probe this code's
> > > > > worst-case behaviour, get an understanding of its effects and then
> > > > > decide whether any such workload is realisic enough to worry about.
> > > > 
> > > > Okay, I'll try few memory pressure scenarios.
> > 
> > A test program:
> > 
> >         while (1) {
> >                 posix_memalign((void **)&p, 2 * MB, 2 * MB);
> >                 assert(*p == 0);
> >                 free(p);
> >         }
> > 
> > With this code in background we have pretty good chance to have huge zero
> > page freeable (refcount == 1) when shrinker callback called - roughly one
> > of two.
> > 
> > Pagecache hog (dd if=hugefile of=/dev/null bs=1M) creates enough pressure
> > to get shrinker callback called, but it was only asked about cache size
> > (nr_to_scan == 0).
> > I was not able to get it called with nr_to_scan > 0 on this scenario, so
> > hzp never freed.
> 
> hm.  It's odd that the kernel didn't try to shrink slabs in this case. 
> Why didn't it??

nr_to_scan == 0 asks for the fast path. shrinker callback can shink, if
it thinks it's good idea.

> 
> > I also tried another scenario: usemem -n16 100M -r 1000. It creates real
> > memory pressure - no easy reclaimable memory. This time callback called
> > with nr_to_scan > 0 and we freed hzp. Under pressure we fails to allocate
> > hzp and code goes to fallback path as it supposed to.
> > 
> > Do I need to check any other scenario?
> 
> I'm thinking that if we do hit problems in this area, we could avoid
> freeing the hugepage unless the scan_control.priority is high enough. 
> That would involve adding a magic number or a tunable to set the
> threshold.

What about ratelimit on alloc path to force fallback if we allocate
to often? Is it good idea?

> Also, it would be beneficial if we can monitor this easily.  Perhaps
> add a counter to /proc/vmstat which tells us how many times that page
> has been reallocated?  And perhaps how many times we tried to allocate
> it but failed?

Okay, I'll prepare patch.

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/