lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 21 May 2012 16:59:52 +0800
From:	Zheng Liu <gnehzuil.liu@...il.com>
To:	Johannes Weiner <hannes@...xchg.org>
Cc:	Michal Hocko <mhocko@...e.cz>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Mel Gorman <mel@....ul.ie>, Minchan Kim <minchan@...nel.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Rik van Riel <riel@...hat.com>
Subject: Re: [PATCH] mm: consider all swapped back pages in used-once logic

On Mon, May 21, 2012 at 09:36:32AM +0200, Johannes Weiner wrote:
> On Mon, May 21, 2012 at 10:51:49AM +0800, Zheng Liu wrote:
> > On Thu, May 17, 2012 at 09:54:25PM +0200, Johannes Weiner wrote:
> > > On Thu, May 17, 2012 at 11:13:53AM +0200, Michal Hocko wrote:
> > > > [64574746 vmscan: detect mapped file pages used only once] made mapped pages
> > > > have another round in inactive list because they might be just short
> > > > lived and so we could consider them again next time. This heuristic
> > > > helps to reduce pressure on the active list with a streaming IO
> > > > worklods.
> > > > This patch fixes a regression introduced by this commit for heavy shmem
> > > > based workloads because unlike Anon pages, which are excluded from this
> > > > heuristic because they are usually long lived, shmem pages are handled
> > > > as a regular page cache.
> > > > This doesn't work quite well, unfortunately, if the workload is mostly
> > > > backed by shmem (in memory database sitting on 80% of memory) with a
> > > > streaming IO in the background (backup - up to 20% of memory). Anon
> > > > inactive list is full of (dirty) shmem pages when watermarks are
> > > > hit. Shmem pages are kept in the inactive list (they are referenced)
> > > > in the first round and it is hard to reclaim anything else so we reach
> > > > lower scanning priorities very quickly which leads to an excessive swap
> > > > out.
> > > > 
> > > > Let's fix this by excluding all swap backed pages (they tend to be long
> > > > lived wrt. the regular page cache anyway) from used-once heuristic and
> > > > rather activate them if they are referenced.
> > > 
> > > Yes, the algorithm only makes sense for file cache, which is easy to
> > > reclaim.  Thanks for the fix!
> > 
> > Hi Johannes,
> > 
> > Out of curiosity, I notice that, in this patch (64574746), the commit log
> > said that this patch aims to reduce the impact of pages used only once.
> > Could you please tell why you think these pages will flood the active
> > list?  How do you find this problem?
> 
> Applications that use mmap for large, linear used-once IO.  Reclaim
> used to just activate every mapped file page it encountered for the
> first time (activate referenced ones, but they all start referenced) .
> This resulted in horrible reclaim latency as most pages in memory
> where active.

Thanks for your explanation. :-)

> 
> > Actually, we met a huge regression in our product system.  This
> > application uses mmap/munmap and read/write simultaneously.  Meanwhile
> > it wants to keep mapped file pages in memory as much as possible.  But
> > this patch causes that mapped file pages are reclaimed frequently.  So I
> > want to know whether or not this patch consider this situation.  Thank
> > you.
> 
> Is it because the read()/write() IO is high throughput and pushes
> pages through the LRU lists faster than the mmap pages are referenced?

Yes, in this application, one query needs to access mapped file page
twice and file page cache twice.  Namely, one query needs to do 4 disk
I/Os.  We have used fadvise(2) to reduce file page cache accessing to
only once.  For mapped file page, in fact them are accessed only once
because in one query the same data is accessed twice.  Thus, one query
causes 2 disk I/Os now.  The size of read/write is quite larger than
mmap/munmap.  So, as you see, if we can keep mmap/munmap file in memory
as much as possible, we will gain the better performance.

> 
> Are the mmap pages executable or shared between tasks?  If so, does
> the kernel you are using include '34dbc67 vmscan: promote shared file
> mapped pages' and 'c909e99 vmscan: activate executable pages after
> first usage'?

Thanks for your advice.  Our application has only one process.  So I
think that 34dbc67 is not useful for this application.  We have tried to
mmap file with PROT_EXEC flag to use this patch (c909e99).  But it seems
that the result is not good as we expected.

In addition, another factor also has some impacts for this application.
In inactive_file_is_low_global(), it is different between 2.6.18 and
upstream kernel.  IMHO, it causes that mapped file pages in active list
are moved into inactive list frequently.

Currently, we add a parameter in inactive_file_is_low_global() to adjust
this ratio.  Meanwhile we activate every mapped file pages for the first
time.  Then the performance gets better, but it still doesn't reach the
performance of 2.6.18.

> 
> All of this is very lame.  I see no way to automatically detect when
> you really want to keep mapped pages over unmapped ones.  And making
> this assumption hurt some loads, while not making it now hurts others.

Yeah, as you said, this kind of changes always hurts some loads and
doesn't hurt others. ;-)

Regards,
Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ