lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 20 Feb 2011 23:43:35 +0900
From:	Minchan Kim <minchan.kim@...il.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	linux-mm <linux-mm@...ck.org>, LKML <linux-kernel@...r.kernel.org>,
	Steven Barrett <damentz@...uorix.net>,
	Ben Gamari <bgamari.foss@...il.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Rik van Riel <riel@...hat.com>, Mel Gorman <mel@....ul.ie>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Wu Fengguang <fengguang.wu@...el.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Nick Piggin <npiggin@...nel.dk>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Minchan Kim <minchan.kim@...il.com>
Subject: [PATCH v6 0/4] fadvise(DONTNEED) support

Recently, there was a reported problem about thrashing.
(http://marc.info/?l=rsync&m=128885034930933&w=2)
It happens by backup workloads(ex, nightly rsync).
That's because the workload makes just use-once pages
and touches pages twice. It promotes the page into
active list so that it results in working set page eviction.
So app developer want to support POSIX_FADV_NOREUSE but other OSes include linux 
don't support it. (http://marc.info/?l=linux-mm&m=128928979512086&w=2)

By other approach, app developers use POSIX_FADV_DONTNEED.
But it has a problem. If kernel meets page is going on writing
during invalidate_mapping_pages, it can't work.
It makes application programmer to use it hard since they always 
consider sync data before calling fadivse(..POSIX_FADV_DONTNEED) to 
make sure the pages couldn't be discardable. At last, they can't use
deferred write of kernel so see performance loss.
(http://insights.oetiker.ch/linux/fadvise.html)

In fact, invalidation is very big hint to reclaimer.
It means we don't use the page any more. So the idea in this series is that
let's move invalidated pages but not-freed page until into inactive list.
It can help relcaim efficiency very much so that it can prevent
eviction working set.

My exeperiment is folowing as.

Test Environment :
DRAM : 2G, CPU : Intel(R) Core(TM)2 CPU
Rsync backup directory size : 16G

rsync version is 3.0.7.
rsync patch is Ben's fadivse.
The stress scenario do following jobs with parallel.

1. git clone linux-2.6
1. make all -j4 linux-mmotm
3. rsync src dst

nrns : no-patched rsync + no stress
prns : patched rsync + no stress
nrs  : no-patched rsync + stress
prs  : patched rsync + stress

For profiling, I added some vmstat.
pginvalidate : the total number of pages which are moved by invalidate_mapping_pages
pgreclaim : the number of pages which are moved at inactive's tail by PG_reclaim of pginvalidate

                        NRNS    PRNS    NRS     PRS 
Elapsed time            36:01.49        37:13.58        01:23:24        01:21:45
nr_vmscan_write         184     1       296     509 
pgactivate              76559   84714   445214  463143
pgdeactivate            19360   40184   74302   91423
pginvalidate            0       2240333 0       1769147
pgreclaim               0       1849651 0       1650796
pgfault                 406208  421860  72485217        70334416
pgmajfault              212     334     5149    3688
pgsteal_dma             0       0       0       0   
pgsteal_normal          2645174 1545116 2521098 1578651
pgsteal_high            5162080 2562269 6074720 3137294
pgsteal_movable         0       0       0       0   
pgscan_kswapd_dma       0       0       0       0   
pgscan_kswapd_normal    2641732 1545374 2499894 1557882
pgscan_kswapd_high      5143794 2567981 5999585 3051150
pgscan_kswapd_movable   0       0       0       0   
pgscan_direct_dma       0       0       0       0   
pgscan_direct_normal    3643    0       21613   21238
pgscan_direct_high      20174   1783    76980   87848
pgscan_direct_movable   0       0       0       0   
pginodesteal            130     1029    3510    24100
slabs_scanned           1421824 1648128 1870720 1880320
kswapd_steal            7785153 4105620 8498332 4608372
kswapd_inodesteal       189432  474052  342835  472503
pageoutrun              100687  52282   145712  70946
allocstall              22      1       149     163 
pgrotated               0       2231408 2932    1765393
unevictable_pgs_scanned 0       0       0       0   

In stress test(NRS vs PRS), pgsteal_[normal|high] are reduced by 37% and 48%.
pgscan_kswapd_[normal|high] are reduced by 37% and 49%.
It means although the VM scan small window, it can reclaim enough pages to work well and
prevent eviction unnecessary page.
rsync program's elapsed time is reduced by 1.5 minutes but I think rsync's fadvise 
isn't good because [NRNS vs NRS] it takes one minutes longer time. 
I think it's because calling unnecessary fadivse system calls so that 
rsync's fadvise should be smart then effect would be much better than now.
The pgmajor fault is reduced by 28%. It's good.
What I can't understand is that why inode steal is increased.
If anyone know it, please explain to me.
Anyway, this patch improves reclaim efficiency very much.

Recently, Steven Barrentt already applied this series to his project kernel 
"Liquorix kernel" and said followin as with one problem.
(The problem is solved by [3/4]. See the description)

" I've been having really good results with your new patch set that
mitigates the problem where a backup utility or something like that
reads each file once and eventually evicting the original working set
out of the page cache.
...
...
 These patches solved some problems on a friend's desktop.
 He said that his wife wanted to send me kisses and hugs because their
computer was so responsive after the patches were applied.
"
So I think this patch series solves real problem.

 - [1/3] is to move invalidated page which is dirty/writeback on active list
   into inactive list's head.
 - [2/3] is to move memcg reclaimable page on inactive's tail.
 - [3/3] is for moving invalidated page into inactive list's tail when the
   page's writeout is completed for reclaim asap.

This patches are based on mmotm-02-04

Changelog since v5:
 - Remove vmstat patch as profiling for final merge

Changelog since v4:
 - Remove patches related to madvise and clean up patch of swap.c
   (I will separate madvise issue from this series and repost after merging this series)

Minchan Kim (3):
  deactivate invalidated pages
  memcg: move memcg reclaimable page into tail of inactive list
  Reclaim invalidated page ASAP

 include/linux/memcontrol.h |    6 ++
 include/linux/swap.h       |    1 +
 mm/memcontrol.c            |   27 ++++++++++
 mm/page-writeback.c        |   12 ++++-
 mm/swap.c                  |  116 +++++++++++++++++++++++++++++++++++++++++++-
 mm/truncate.c              |   17 +++++--
 6 files changed, 172 insertions(+), 7 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ