linux-kernel - Re: mm: kswapd struggles reclaiming the pages on 64GB server

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Wed, 31 Aug 2016 14:27:21 +0100
From:   Andriy Tkachuk <andriy.tkachuk@...gate.com>
To:     linux-kernel@...r.kernel.org
Subject: Re: mm: kswapd struggles reclaiming the pages on 64GB server

Alright - after disabling memory cgroup all works perfectly with the
patch. Even with default vm parameters.

Here are some vmstat results to compare. Now:

# vmstat 60
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 4  0 67606176 375196  38708 1385896    0   74    23 1266751 198073
103648  6  7 86  1  0
 3  0 67647904 394872  38612 1371200    0  695    18 1371067 212143
93917  7  8 85  1  0
 2  0 67648016 375796  38676 1382812    1    2    13 1356271 215123
115987  6  7 85  1  0
 3  0 67657392 378336  38744 1383468    1  157    15 1383591 213694
102457  6  7 86  1  0
 6  0 67659088 367856  38796 1388696    1   28    26 1330238 208377
111469  6  7 86  1  0
 2  0 67701344 407320  38680 1371004    0  704    34 1255911 203308
126458  8  8 82  3  0
 4  0 67711920 402296  38776 1380836    0  176     8 1308525 201451
93053  6  7 86  1  0
 8  0 67721264 376676  38872 1394816    0  156    14 1409726 218269
108127  7  8 85  1  0
18  0 67753872 395568  38896 1397144    0  544    16 1288576 201680
105980  6  7 86  1  0
 2  0 67755544 362960  38992 1411744    0   28    17 1458453 232544
127088  6  7 85  1  0
 4  0 67784056 376684  39088 1410924    0  475    25 1385831 218800
110344  6  7 85  1  0
 2  0 67816104 393108  38800 1384108    1  535    17 1336857 208551
105872  6  7 85  1  0
 7  0 67816104 399492  38820 1387096    0    0    17 1280630 205478
109499  6  7 86  1  0
 1  0 67821648 375284  38908 1397132    1   93    15 1343042 208363
98031  6  7 85  1  0
 1  0 67823512 363828  38924 1402388    0   31    15 1366995 212606
101328  6  7 85  1  0
 5  0 67864264 416720  38784 1374480    1  680    21 1372581 210256
95369  7  8 83  3  0

Swapping works smoothly, more than enough memory for caching
available, cpu-wait is about 1.

Before:

# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  2 13755748 334968   2140  63780 6684    0  7644    21 3122 7704  0
 9 83  8  0
 2  2 13760380 333628   2140  62468 4572 7764  4764  9129 3326 8678  0
10 83  7  0
 2  2 13761072 332888   2140  62608 4576 4256  4616  4470 3377 8906  0
10 82  7  0
 2  2 13760812 341532   2148  62644 5388 3532  5996  3996 3451 7521  0
10 83  7  0
 3  3 13757648 335116   2148  62944 6176    0  6480   238 3412 8905  0
10 83  7  0
 2  2 13752936 331908   2148  62336 7488    0  7628   201 3433 7483  0
10 83  7  0
 2  2 13752520 344428   2148  69412 5292 2160 15820  2324 7254 15960
0 11 82  7  0
 3  2 13750856 338056   2148  69864 5576    0  5984    28 3384 8060  0
10 84  6  0
 2  2 13748836 331516   2156  70116 6076    0  6376    44 3683 6941  2
10 82  6  0
 2  2 13750184 335732   2148  70764 3544 2664  4252  2692 3682 8435  3
10 83  4  0
 2  4 13747528 338492   2144  70872 9520 3152  9688  3176 4846 7013  1
10 82  7  0
 3  2 13756580 341752   2144  71060 9020 14740  9148 14764 4167 8024
1 10 80  9  0
 2  2 13749484 336900   2144  71504 6444    0  6916    24 3613 8472  1
10 82  7  0
 2  2 13740560 333148   2152  72480 6932    0  7952    44 3891 6819  1
10 82  7  0
 2  2 13734456 330896   2148  72920 12228 1736 12488  1764 3454 9321
2  9 82  8  0

The system got into classic thrashing from which it never came out.

Now:

# cat /proc/vmstat | egrep
'nr_.*active_|pg(steal|scan|refill).*_normal|nr_vmscan_write|nr_swap|pgact'
nr_inactive_anon 7546598
nr_active_anon 7547226
nr_inactive_file 175973
nr_active_file 179439
nr_vmscan_write 17862257
pgactivate 213529452
pgrefill_normal 50400148
pgsteal_kswapd_normal 55904846
pgsteal_direct_normal 2417827
pgscan_kswapd_normal 76263257
pgscan_direct_normal 3213568

Before:

# cat /proc/vmstat | egrep
'nr_.*active_|pg(steal|scan|refill).*_normal|nr_vmscan_write|nr_swap|pgact'
nr_inactive_anon 695534
nr_active_anon 14427464
nr_inactive_file 2786
nr_active_file 2698
nr_vmscan_write 1740097
pgactivate 115697891
pgrefill_normal 33345818
pgsteal_kswapd_normal 367908859
pgsteal_direct_normal 681266
pgscan_kswapd_normal 10255454426

Here is the patch again for convenience:

--- linux-3.10.0-229.20.1.el7.x86_64.orig/mm/page_alloc.c
2015-09-24 15:47:25.000000000 +0000
+++ linux-3.10.0-229.20.1.el7.x86_64/mm/page_alloc.c    2016-08-15
09:49:46.922240569 +0000
@@ -5592,16 +5592,7 @@
  */
 static void __meminit calculate_zone_inactive_ratio(struct zone *zone)
 {
-       unsigned int gb, ratio;
-
-       /* Zone size in gigabytes */
-       gb = zone->managed_pages >> (30 - PAGE_SHIFT);
-       if (gb)
-               ratio = int_sqrt(10 * gb);
-       else
-               ratio = 1;
-
-       zone->inactive_ratio = ratio;
+       zone->inactive_ratio = 1;
 }

Hope it will help someone facing the similar problems.

Regards,
  Andriy

On Tue, Aug 23, 2016 at 4:14 PM, Andriy Tkachuk
<andriy.tkachuk@...gate.com> wrote:
> Well, as appeared - the patch did not affect the problem at all since
> the memory cgroup was on (in which case zone's inactive_ratio is not
> used, but the ratio is calculated directly at
> mem_cgroup_inactive_anon_is_low()). So the patch will be retested with
> memory cgroup off.
>
>   Andriy
>
> On Mon, Aug 22, 2016 at 11:46 PM, Andriy Tkachuk
> <andriy.tkachuk@...gate.com> wrote:
>> On Mon, Aug 22, 2016 at 7:37 PM, Andriy Tkachuk
>> <andriy.tkachuk@...gate.com> wrote:
>>>
>>> The following patch resolved the problem:
>>> ...
>>
>> Sorry, I was too hurry in sending good news. As appeared - the problem
>> is still there:
>>