linux-kernel - Re: [PATCH 0/8] Reduce latencies and improve overall reclaim efficiency v2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20100916152804.1b4155fd.akpm@linux-foundation.org>
Date:	Thu, 16 Sep 2010 15:28:04 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Mel Gorman <mel@....ul.ie>
Cc:	linux-mm@...ck.org, linux-fsdevel@...r.kernel.org,
	Linux Kernel List <linux-kernel@...r.kernel.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Minchan Kim <minchan.kim@...il.com>,
	Wu Fengguang <fengguang.wu@...el.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
Subject: Re: [PATCH 0/8] Reduce latencies and improve overall reclaim
 efficiency v2

On Wed, 15 Sep 2010 13:27:43 +0100
Mel Gorman <mel@....ul.ie> wrote:

> This is v2 of a series to reduce some of the latencies seen in page reclaim
> and to improve the efficiency a bit.

epic changelog!

>
> ...
>
> The tests run were as follows
> 
> kernbench
> 	compile-based benchmark. Smoke test performance
> 
> sysbench
> 	OLTP read-only benchmark. Will be re-run in the future as read-write
> 
> micro-mapped-file-stream
> 	This is a micro-benchmark from Johannes Weiner that accesses a
> 	large sparse-file through mmap(). It was configured to run in only
> 	single-CPU mode but can be indicative of how well page reclaim
> 	identifies suitable pages.
> 
> stress-highalloc
> 	Tries to allocate huge pages under heavy load.
> 
> kernbench, iozone and sysbench did not report any performance regression
> on any machine. sysbench did pressure the system lightly and there was reclaim
> activity but there were no difference of major interest between the kernels.
> 
> X86-64 micro-mapped-file-stream
> 
>                                       traceonly-v2r2           lowlumpy-v2r3        waitcongest-v2r3     waitwriteback-v2r4
> pgalloc_dma                       1639.00 (   0.00%)       667.00 (-145.73%)      1167.00 ( -40.45%)       578.00 (-183.56%)
> pgalloc_dma32                  2842410.00 (   0.00%)   2842626.00 (   0.01%)   2843043.00 (   0.02%)   2843014.00 (   0.02%)
> pgalloc_normal                       0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)
> pgsteal_dma                        729.00 (   0.00%)        85.00 (-757.65%)       609.00 ( -19.70%)       125.00 (-483.20%)
> pgsteal_dma32                  2338721.00 (   0.00%)   2447354.00 (   4.44%)   2429536.00 (   3.74%)   2436772.00 (   4.02%)
> pgsteal_normal                       0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)
> pgscan_kswapd_dma                 1469.00 (   0.00%)       532.00 (-176.13%)      1078.00 ( -36.27%)       220.00 (-567.73%)
> pgscan_kswapd_dma32            4597713.00 (   0.00%)   4503597.00 (  -2.09%)   4295673.00 (  -7.03%)   3891686.00 ( -18.14%)
> pgscan_kswapd_normal                 0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)
> pgscan_direct_dma                   71.00 (   0.00%)       134.00 (  47.01%)       243.00 (  70.78%)       352.00 (  79.83%)
> pgscan_direct_dma32             305820.00 (   0.00%)    280204.00 (  -9.14%)    600518.00 (  49.07%)    957485.00 (  68.06%)
> pgscan_direct_normal                 0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)
> pageoutrun                       16296.00 (   0.00%)     21254.00 (  23.33%)     18447.00 (  11.66%)     20067.00 (  18.79%)
> allocstall                         443.00 (   0.00%)       273.00 ( -62.27%)       513.00 (  13.65%)      1568.00 (  71.75%)
> 
> These are based on the raw figures taken from /proc/vmstat. It's a rough
> measure of reclaim activity. Note that allocstall counts are higher because
> we are entering direct reclaim more often as a result of not sleeping in
> congestion. In itself, it's not necessarily a bad thing. It's easier to
> get a view of what happened from the vmscan tracepoint report.
> 
> FTrace Reclaim Statistics: vmscan
> 
>                                 traceonly-v2r2   lowlumpy-v2r3 waitcongest-v2r3 waitwriteback-v2r4
> Direct reclaims                                443        273        513       1568 
> Direct reclaim pages scanned                305968     280402     600825     957933 
> Direct reclaim pages reclaimed               43503      19005      30327     117191 
> Direct reclaim write file async I/O              0          0          0          0 
> Direct reclaim write anon async I/O              0          3          4         12 
> Direct reclaim write file sync I/O               0          0          0          0 
> Direct reclaim write anon sync I/O               0          0          0          0 
> Wake kswapd requests                        187649     132338     191695     267701 
> Kswapd wakeups                                   3          1          4          1 
> Kswapd pages scanned                       4599269    4454162    4296815    3891906 
> Kswapd pages reclaimed                     2295947    2428434    2399818    2319706 
> Kswapd reclaim write file async I/O              1          0          1          1 
> Kswapd reclaim write anon async I/O             59        187         41        222 
> Kswapd reclaim write file sync I/O               0          0          0          0 
> Kswapd reclaim write anon sync I/O               0          0          0          0 
> Time stalled direct reclaim (seconds)         4.34       2.52       6.63       2.96 
> Time kswapd awake (seconds)                  11.15      10.25      11.01      10.19 
> 
> Total pages scanned                        4905237   4734564   4897640   4849839
> Total pages reclaimed                      2339450   2447439   2430145   2436897
> %age total pages scanned/reclaimed          47.69%    51.69%    49.62%    50.25%
> %age total pages scanned/written             0.00%     0.00%     0.00%     0.00%
> %age  file pages scanned/written             0.00%     0.00%     0.00%     0.00%
> Percentage Time Spent Direct Reclaim        29.23%    19.02%    38.48%    20.25%
> Percentage Time kswapd Awake                78.58%    78.85%    76.83%    79.86%
> 
> What is interesting here for nocongest in particular is that while direct
> reclaim scans more pages, the overall number of pages scanned remains the same
> and the ratio of pages scanned to pages reclaimed is more or less the same. In
> other words, while we are sleeping less, reclaim is not doing more work and
> as direct reclaim and kswapd is awake for less time, it would appear to be doing less work.

Yes, I think the reclaimed/scanned ratio (what I call "reclaim
efficiency") is a key metric.

50% is low!  What's the testcase here? micro-mapped-file-stream?

It's strange that the "total pages reclaimed" increased a little.  Just
a measurement glitch?

> FTrace Reclaim Statistics: congestion_wait
> Direct number congest     waited                87        196         64          0 
> Direct time   congest     waited            4604ms     4732ms     5420ms        0ms 
> Direct full   congest     waited                72        145         53          0 
> Direct number conditional waited                 0          0        324       1315 
> Direct time   conditional waited               0ms        0ms        0ms        0ms 
> Direct full   conditional waited                 0          0          0          0 
> KSwapd number congest     waited                20         10         15          7 
> KSwapd time   congest     waited            1264ms      536ms      884ms      284ms 
> KSwapd full   congest     waited                10          4          6          2 
> KSwapd number conditional waited                 0          0          0          0 
> KSwapd time   conditional waited               0ms        0ms        0ms        0ms 
> KSwapd full   conditional waited                 0          0          0          0 
> 
> The vanilla kernel spent 8 seconds asleep in direct reclaim and no time at
> all asleep with the patches.
> 
> MMTests Statistics: duration
> User/Sys Time Running Test (seconds)         10.51     10.73      10.6     11.66
> Total Elapsed Time (seconds)                 14.19     13.00     14.33     12.76

Is that user time plus system time?  If so, why didn't user+sys equal
elapsed in the we-never-slept-in-congestion-wait() case?  Because the
test's CPU got stolen by kswapd perhaps?

> Overall, the tests completed faster. It is interesting to note that backing off further
> when a zone is congested and not just a BDI was more efficient overall.
> 
> PPC64 micro-mapped-file-stream
> pgalloc_dma                    3024660.00 (   0.00%)   3027185.00 (   0.08%)   3025845.00 (   0.04%)   3026281.00 (   0.05%)
> pgalloc_normal                       0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)
> pgsteal_dma                    2508073.00 (   0.00%)   2565351.00 (   2.23%)   2463577.00 (  -1.81%)   2532263.00 (   0.96%)
> pgsteal_normal                       0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)
> pgscan_kswapd_dma              4601307.00 (   0.00%)   4128076.00 ( -11.46%)   3912317.00 ( -17.61%)   3377165.00 ( -36.25%)
> pgscan_kswapd_normal                 0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)
> pgscan_direct_dma               629825.00 (   0.00%)    971622.00 (  35.18%)   1063938.00 (  40.80%)   1711935.00 (  63.21%)
> pgscan_direct_normal                 0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)
> pageoutrun                       27776.00 (   0.00%)     20458.00 ( -35.77%)     18763.00 ( -48.04%)     18157.00 ( -52.98%)
> allocstall                         977.00 (   0.00%)      2751.00 (  64.49%)      2098.00 (  53.43%)      5136.00 (  80.98%)
> 
> ...
>
> 
> X86-64 STRESS-HIGHALLOC
>                 traceonly-v2r2     lowlumpy-v2r3  waitcongest-v2r3waitwriteback-v2r4
> Pass 1          82.00 ( 0.00%)    84.00 ( 2.00%)    85.00 ( 3.00%)    85.00 ( 3.00%)
> Pass 2          90.00 ( 0.00%)    87.00 (-3.00%)    88.00 (-2.00%)    89.00 (-1.00%)
> At Rest         92.00 ( 0.00%)    90.00 (-2.00%)    90.00 (-2.00%)    91.00 (-1.00%)
> 
> Success figures across the board are broadly similar.
> 
>                 traceonly-v2r2     lowlumpy-v2r3  waitcongest-v2r3waitwriteback-v2r4
> Direct reclaims                               1045        944        886        887 
> Direct reclaim pages scanned                135091     119604     109382     101019 
> Direct reclaim pages reclaimed               88599      47535      47863      46671 
> Direct reclaim write file async I/O            494        283        465        280 
> Direct reclaim write anon async I/O          29357      13710      16656      13462 
> Direct reclaim write file sync I/O             154          2          2          3 
> Direct reclaim write anon sync I/O           14594        571        509        561 
> Wake kswapd requests                          7491        933        872        892 
> Kswapd wakeups                                 814        778        731        780 
> Kswapd pages scanned                       7290822   15341158   11916436   13703442 
> Kswapd pages reclaimed                     3587336    3142496    3094392    3187151 
> Kswapd reclaim write file async I/O          91975      32317      28022      29628 
> Kswapd reclaim write anon async I/O        1992022     789307     829745     849769 
> Kswapd reclaim write file sync I/O               0          0          0          0 
> Kswapd reclaim write anon sync I/O               0          0          0          0 
> Time stalled direct reclaim (seconds)      4588.93    2467.16    2495.41    2547.07 
> Time kswapd awake (seconds)                2497.66    1020.16    1098.06    1176.82 
> 
> Total pages scanned                        7425913  15460762  12025818  13804461
> Total pages reclaimed                      3675935   3190031   3142255   3233822
> %age total pages scanned/reclaimed          49.50%    20.63%    26.13%    23.43%
> %age total pages scanned/written            28.66%     5.41%     7.28%     6.47%
> %age  file pages scanned/written             1.25%     0.21%     0.24%     0.22%
> Percentage Time Spent Direct Reclaim        57.33%    42.15%    42.41%    42.99%
> Percentage Time kswapd Awake                43.56%    27.87%    29.76%    31.25%
> 
> Scanned/reclaimed ratios again look good with big improvements in
> efficiency. The Scanned/written ratios also look much improved. With a
> better scanned/written ration, there is an expectation that IO would be more
> efficient and indeed, the time spent in direct reclaim is much reduced by
> the full series and kswapd spends a little less time awake.

Wait.  The reclaim efficiency got *worse*, didn't it?  To reclaim
3,xxx,xxx pages, the number of pages we had to scan went from 7,xxx,xxx
up to 13,xxx,xxx?

>
> ...
>
> I think this series is ready for much wider testing. The lowlumpy patches in
> particular should be relatively uncontroversial. While their largest impact
> can be seen in the high order stress tests, they would also have an impact
> if SLUB was configured (these tests are based on slab) and stalls in lumpy
> reclaim could be partially responsible for some desktop stalling reports.

slub sucks :(

Is this patchset likely to have any impact on the "hey my net driver
couldn't do an order 3 allocation" reports?  I guess not.

>
> ...
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/