lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20160531100848.GR2527@techsingularity.net>
Date:	Tue, 31 May 2016 11:08:48 +0100
From:	Mel Gorman <mgorman@...hsingularity.net>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Geert Uytterhoeven <geert@...ux-m68k.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Linux MM <linux-mm@...ck.org>,
	linux-m68k <linux-m68k@...ts.linux-m68k.org>
Subject: [PATCH] mm, page_alloc: Reset zonelist iterator after resetting fair
 zone allocation policy

Geert Uytterhoeven reported the following problem that bisected to commit
c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a
zonelist twice") on m68k/ARAnyM

    BUG: scheduling while atomic: cron/668/0x10c9a0c0
    Modules linked in:
    CPU: 0 PID: 668 Comm: cron Not tainted 4.6.0-atari-05133-gc33d6c06f60f710f #364
    Stack from 10c9a074:
            10c9a074 003763ca 0003d7d0 00361a58 00bcf834 0000029c 10c9a0c0 10c9a0c0
            002f0f42 00bcf5e0 00000000 00000082 0048e018 00000000 00000000 002f0c30
            000410de 00000000 00000000 10c9a0e0 002f112c 00000000 7fffffff 10c9a180
            003b1490 00bcf60c 10c9a1f0 10c9a118 002f2d30 00000000 10c9a174 10c9a180
            0003ef56 003b1490 00bcf60c 003b1490 00bcf60c 0003eff6 003b1490 00bcf60c
            003b1490 10c9a128 002f118e 7fffffff 00000082 002f1612 002f1624 7fffffff
    Call Trace: [<0003d7d0>] __schedule_bug+0x40/0x54
     [<002f0f42>] __schedule+0x312/0x388
     [<002f0c30>] __schedule+0x0/0x388
     [<000410de>] prepare_to_wait+0x0/0x52
     [<002f112c>] schedule+0x64/0x82
     [<002f2d30>] schedule_timeout+0xda/0x104
     [<0003ef56>] set_next_entity+0x18/0x40
     [<0003eff6>] pick_next_task_fair+0x78/0xda
     [<002f118e>] io_schedule_timeout+0x36/0x4a
     [<002f1612>] bit_wait_io+0x0/0x40
     [<002f1624>] bit_wait_io+0x12/0x40
     [<002f12c4>] __wait_on_bit+0x46/0x76
     [<0006a06a>] wait_on_page_bit_killable+0x64/0x6c
     [<002f1612>] bit_wait_io+0x0/0x40
     [<000411fe>] wake_bit_function+0x0/0x4e
     [<0006a1b8>] __lock_page_or_retry+0xde/0x124
     [<00217000>] do_scan_async+0x114/0x17c
     [<00098856>] lookup_swap_cache+0x24/0x4e
     [<0008b7c8>] handle_mm_fault+0x626/0x7de
     [<0008ef46>] find_vma+0x0/0x66
     [<002f2612>] down_read+0x0/0xe
     [<0006a001>] wait_on_page_bit_killable_timeout+0x77/0x7c
     [<0008ef5c>] find_vma+0x16/0x66
     [<00006b44>] do_page_fault+0xe6/0x23a
     [<0000c350>] res_func+0xa3c/0x141a
     [<00005bb8>] buserr_c+0x190/0x6d4
     [<0000c350>] res_func+0xa3c/0x141a
     [<000028ec>] buserr+0x20/0x28
     [<0000c350>] res_func+0xa3c/0x141a
     [<000028ec>] buserr+0x20/0x28

The relationship is not obvious but it's due to a failure to rescan the full
zonelist after the fair zone allocation policy exhausts the batch count. While
this is a functional problem, it's also a performance issue. A page allocator
microbenchmark showed the following

                                 4.7.0-rc1                  4.7.0-rc1
                                   vanilla                 reset-v1r2
Min      alloc-odr0-1     327.00 (  0.00%)           326.00 (  0.31%)
Min      alloc-odr0-2     235.00 (  0.00%)           235.00 (  0.00%)
Min      alloc-odr0-4     198.00 (  0.00%)           198.00 (  0.00%)
Min      alloc-odr0-8     170.00 (  0.00%)           170.00 (  0.00%)
Min      alloc-odr0-16    156.00 (  0.00%)           156.00 (  0.00%)
Min      alloc-odr0-32    150.00 (  0.00%)           150.00 (  0.00%)
Min      alloc-odr0-64    146.00 (  0.00%)           146.00 (  0.00%)
Min      alloc-odr0-128   145.00 (  0.00%)           145.00 (  0.00%)
Min      alloc-odr0-256   155.00 (  0.00%)           155.00 (  0.00%)
Min      alloc-odr0-512   168.00 (  0.00%)           165.00 (  1.79%)
Min      alloc-odr0-1024  175.00 (  0.00%)           174.00 (  0.57%)
Min      alloc-odr0-2048  180.00 (  0.00%)           180.00 (  0.00%)
Min      alloc-odr0-4096  187.00 (  0.00%)           186.00 (  0.53%)
Min      alloc-odr0-8192  190.00 (  0.00%)           190.00 (  0.00%)
Min      alloc-odr0-16384 191.00 (  0.00%)           191.00 (  0.00%)
Min      alloc-odr1-1     736.00 (  0.00%)           445.00 ( 39.54%)
Min      alloc-odr1-2     343.00 (  0.00%)           335.00 (  2.33%)
Min      alloc-odr1-4     277.00 (  0.00%)           270.00 (  2.53%)
Min      alloc-odr1-8     238.00 (  0.00%)           233.00 (  2.10%)
Min      alloc-odr1-16    224.00 (  0.00%)           218.00 (  2.68%)
Min      alloc-odr1-32    210.00 (  0.00%)           208.00 (  0.95%)
Min      alloc-odr1-64    207.00 (  0.00%)           203.00 (  1.93%)
Min      alloc-odr1-128   276.00 (  0.00%)           202.00 ( 26.81%)
Min      alloc-odr1-256   206.00 (  0.00%)           202.00 (  1.94%)
Min      alloc-odr1-512   207.00 (  0.00%)           202.00 (  2.42%)
Min      alloc-odr1-1024  208.00 (  0.00%)           205.00 (  1.44%)
Min      alloc-odr1-2048  213.00 (  0.00%)           212.00 (  0.47%)
Min      alloc-odr1-4096  218.00 (  0.00%)           216.00 (  0.92%)
Min      alloc-odr1-8192  341.00 (  0.00%)           219.00 ( 35.78%)
.
Note that order-0 allocations are unaffected but higher orders get
a small boost from this patch and a large reduction in system CPU
usage overall as can be seen here

           4.7.0-rc1   4.7.0-rc1
             vanilla  reset-v1r2
User           85.32       86.31
System       2221.39     2053.36
Elapsed      2368.89     2202.47

Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
Reported-and-tested-by: Geert Uytterhoeven <geert@...ux-m68k.org>
Signed-off-by: Mel Gorman <mgorman@...hsingularity.net>
---
 mm/page_alloc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bb320cde4d6d..557549c81083 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3024,6 +3024,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
 		apply_fair = false;
 		fair_skipped = false;
 		reset_alloc_batches(ac->preferred_zoneref->zone);
+		z = ac->preferred_zoneref;
 		goto zonelist_scan;
 	}

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ