linux-kernel - Re: [RFC][PATCH 0/7] memcg async reclaim

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20110512103503.717f4a96.kamezawa.hiroyu@jp.fujitsu.com>
Date:	Thu, 12 May 2011 10:35:03 +0900
From:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Ying Han <yinghan@...gle.com>,
	Johannes Weiner <jweiner@...hat.com>,
	Michal Hocko <mhocko@...e.cz>,
	"balbir@...ux.vnet.ibm.com" <balbir@...ux.vnet.ibm.com>,
	"nishimura@....nes.nec.co.jp" <nishimura@....nes.nec.co.jp>
Subject: Re: [RFC][PATCH 0/7] memcg async reclaim

On Wed, 11 May 2011 18:28:44 -0700
Andrew Morton <akpm@...ux-foundation.org> wrote:

> On Tue, 10 May 2011 19:02:16 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com> wrote:
> 
> > Hi, thank you for all comments on previous patches for watermarks for memcg.
> > 
> > This is a new series as 'async reclaim', no watermark.
> > This version is a RFC again and I don't ask anyone to test this...but
> > comments/review are appreciated. 
> > 
> > Major changes are
> >   - no configurable watermark
> >   - hierarchy support
> >   - more fix for static scan rate round robin scanning of memcg.
> > 
> > (assume x86-64 in following.)
> > 
> > 'async reclaim' works when
> >    - usage > limit - 4MB.
> > until
> >    - usage < limit - 8MB.
> > 
> > when the limit is larger than 128MB. This value of margin to limit
> > has some purpose for helping to reduce page fault latency at using
> > Transparent hugepage.
> > 
> > Considering THP, we need to reclaim HPAGE_SIZE(2MB) of pages when we hit
> > limit and consume HPAGE_SIZE(2MB) immediately. Then, the application need to
> > scan 2MB per each page fault and get big latency. So, some margin > HPAGE_SIZE
> > is required. I set it as 2*HPAGE_SIZE/4*HPAGE_SIZE, here. The kernel
> > will do async reclaim and reduce usage to limit - 8MB in background.
> > 
> > BTW, when an application gets a page, it tend to do some action to fill the
> > gotton page. For example, reading data from file/network and fill buffer.
> > This implies the application will have a wait or consumes cpu other than
> > reclaiming memory. So, if the kernel can help memory freeing in background
> > while application does another jobs, application latency can be reduced.
> > Then, this kind of asyncronous reclaim of memory will be a help for reduce
> > memory reclaim latency by memcg. But the total amount of cpu time consumed
> > will not have any difference.
> > 
> > This patch series implements
> >   - a logic for trigger async reclaim
> >   - help functions for async reclaim
> >   - core logic for async reclaim, considering memcg's hierarchy.
> >   - static scan rate memcg reclaim.
> >   - workqueue for async reclaim.
> > 
> > Some concern is that I didn't implement a code for handle the case
> > most of pages are mlocked or anon memory in swapless system. I need some
> > detection logic to avoid hopless async reclaim.
> > 
> 
> What (user-visible) problem is this patchset solving?
> 
> IOW, what is the current behaviour, what is wrong with that behaviour
> and what effects does the patchset have upon that behaviour?
> 
> The sole answer from the above is "latency spikes".  Anything else?
> 

I think this set has possibility to fix latency spike. 

For example, in previous set, (which has tuning knobs), do a file copy
of 400M file under 400M limit.
==
1) == hard limit = 400M ==
[root@...l6-test hilow]# time cp ./tmpfile xxx                
real    0m7.353s
user    0m0.009s
sys     0m3.280s

2) == hard limit 500M/ hi_watermark = 400M ==
[root@...l6-test hilow]# time cp ./tmpfile xxx

real    0m6.421s
user    0m0.059s
sys     0m2.707s
==
and in both case, memory usage after test was 400M.

IIUC, this speed up is because memory reclaim runs in background file 'cp'
read/write files. But above test uses 100MB of margin. I gues we don't need
100MB of margin as above but will not get full speed with 8MB margin. There 
will be trade-off because users may want to use memory up to the limit. 

So, this set tries to set some 'default' margin, which is not too big and has
idea that implements async reclaim without tuning knobs. I'll measure
some more and report it in the next post.


> Have these spikes been observed and measured?  We should have a
> testcase/worload with quantitative results to demonstrate and measure
> the problem(s), so the effectiveness of the proposed solution can be
> understood.
> 
> 

Yes, you're right, of course.
This set just shows the design changes caused by removing tuning knobs as
a result of long discussion. 

As an output of it, we do
 1. impleimenting async reclaim without tuning knobs.
 2. add some on-demand background reclaim as 'active softlimit', which means
    a mode of softlimit, shrinking memory always even if the system has plenty of
    free memory. And current softlimit, which works only when memory are in short,
    will be called as 'passive softlimit'.

Thanks,
-Kame




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/