lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK-uSPo9Nc-1HaURvwstOGYGuMEx4CXhPRv+cZevYLZX6URzYw@mail.gmail.com>
Date:	Fri, 12 Aug 2016 21:52:20 +0100
From:	Andriy Tkachuk <andriy.tkachuk@...gate.com>
To:	linux-kernel@...r.kernel.org
Cc:	Mel Gorman <mgorman@...e.de>
Subject: mm: kswapd struggles reclaiming the pages on 64GB server

Hi,

our user-space application uses large amount of anon pages (private
mapping of the large file, more than 64GB RAM available in the system)
which are rarely accessible and are supposed to be swapped out.
Instead, we see that most of these pages are kept in memory while the
system suffers from the lack of free memory and overall performance
(especially the disk I/O, vm.swappiness=100 does not help it). kswapd
scans millions of pages per second but reclames hundreds per sec only.
Here are the 5 secs interval snapshots of some counters:

$ egrep 'Cached|nr_.*active_anon|pgsteal_.*_normal|pgscan_kswapd_normal|pgrefill_normal|nr_vmscan_write|nr_swap|pgact'
proc-*-0616-1605[345]* | sed 's/:/ /' | sort -sk 2,2
proc-meminfo-0616-160539.txt Cached:           347936 kB
proc-meminfo-0616-160549.txt Cached:           316316 kB
proc-meminfo-0616-160559.txt Cached:           322264 kB
proc-meminfo-0616-160539.txt SwapCached:      2853064 kB
proc-meminfo-0616-160549.txt SwapCached:      2853168 kB
proc-meminfo-0616-160559.txt SwapCached:      2853280 kB
proc-vmstat-0616-160535.txt nr_active_anon 14508616
proc-vmstat-0616-160545.txt nr_active_anon 14513725
proc-vmstat-0616-160555.txt nr_active_anon 14515197
proc-vmstat-0616-160535.txt nr_inactive_anon 747407
proc-vmstat-0616-160545.txt nr_inactive_anon 744846
proc-vmstat-0616-160555.txt nr_inactive_anon 744509
proc-vmstat-0616-160535.txt nr_vmscan_write 5589095
proc-vmstat-0616-160545.txt nr_vmscan_write 5589097
proc-vmstat-0616-160555.txt nr_vmscan_write 5589097
proc-vmstat-0616-160535.txt pgactivate 246016824
proc-vmstat-0616-160545.txt pgactivate 246033242
proc-vmstat-0616-160555.txt pgactivate 246042064
proc-vmstat-0616-160535.txt pgrefill_normal 22763262
proc-vmstat-0616-160545.txt pgrefill_normal 22768020
proc-vmstat-0616-160555.txt pgrefill_normal 22768178
proc-vmstat-0616-160535.txt pgscan_kswapd_normal 111985367420
proc-vmstat-0616-160545.txt pgscan_kswapd_normal 111996845554
proc-vmstat-0616-160555.txt pgscan_kswapd_normal 112028276639
proc-vmstat-0616-160535.txt pgsteal_direct_normal 344064
proc-vmstat-0616-160545.txt pgsteal_direct_normal 344064
proc-vmstat-0616-160555.txt pgsteal_direct_normal 344064
proc-vmstat-0616-160535.txt pgsteal_kswapd_normal 53817848
proc-vmstat-0616-160545.txt pgsteal_kswapd_normal 53818626
proc-vmstat-0616-160555.txt pgsteal_kswapd_normal 53818637

The pgrefill_normal and pgactivate counters show that only few
hundreds/sec pages move from active to inactive and vice versa lists -
that is comparable with what was reclaimed. So it looks like kswapd
scans the pages from inactive list mostly in kind of a loop and does
not even have a chance to look at the pages from the active list
(where most of the application's anon pages are located).

The kernel version: linux-3.10.0-229.14.1.el7.

Any ideas? Would be be useful to change inactive_ratio dynamically in
such a cases so that more pages could be moved from active to inactive
list and get a chance to be reclaimed? (Note: when application is
restarted - the problem disappears for a while (days) until the
correspondent number of privately mapped pages are dirtied again.)

Thank you,
   Andriy

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ