lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20081203163740.1D4D.KOSAKI.MOTOHIRO@jp.fujitsu.com>
Date:	Wed,  3 Dec 2008 18:20:46 +0900 (JST)
From:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
To:	"wassim dagash" <wassim.dagash@...il.com>
Cc:	kosaki.motohiro@...fujitsu.com, linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Nick Piggin <npiggin@...e.de>
Subject: Re: KSWAPD Algorithm - 100% CPU

(CC to Nick Piggin and Andrew Morton.)

Hi

At first, could you post reproduce program?
if nobody can reproduce, fixing is difficult.

obiously, we need the patch validate by reproduce program.


> Hi All,
> Description:
> I countered a weird problem with kswapd:
> it runs in some infinite loop trying to swap until order 10 of zone
> highmem is OK, While zone higmem (as I understand) has nothing to do
> with contiguous memory (cause there is no 1-1 mapping) which means
> kswapd will continue to try to balance order 10 of zone highmem
> forever (or until someone release a very large chunk of highmem).
> Can anyone please explain me the algorithm of kswapd and why it tries
> to balance order 10 of zone higmem ?

At second, I'd like to talk about kswapd background and algorithm.

1st kswapd balancing introduced following commit.

--------------------------------------------------------
commit 6cbd719443491404f63f9ff79ead9eba256511ee
Author: akpm <akpm>
Date:   Fri Mar 12 16:24:40 2004 +0000

    [PATCH] kswapd: fix lumpy page reclaim

    As kswapd is now scanning zones in the highmem->normal->dma direction it can
    get into competition with the page allocator: kswapd keep on trying to free
    pages from highmem, then kswapd moves onto lowmem.  By the time kswapd has
    done proportional scanning in lowmem, someone has come in and allocated a few
    pages from highmem.  So kswapd goes back and frees some highmem, then some
    lowmem again.  But nobody has allocated any lowmem yet.  So we keep on and on
    scanning lowmem in response to highmem page allocations.

    With a simple `dd' on a 1G box we get:

     r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy wa id
     0  3      0  59340   4628 922348    0    0     4 28188 1072   808  0 10 46 44
     0  3      0  29932   4660 951760    0    0     0 30752 1078   441  1  6 30 64
     0  3      0  57568   4556 924052    0    0     0 30748 1075   478  0  8 43 49
     0  3      0  29664   4584 952176    0    0     0 30752 1075   472  0  6 34 60
     0  3      0   5304   4620 976280    0    0     4 40484 1073   456  1  7 52 41
     0  3      0 104856   4508 877112    0    0     0 18452 1074    97  0  7 67 26
     0  3      0  70768   4540 911488    0    0     0 35876 1078   746  0  7 34 59
     1  2      0  42544   4568 939680    0    0     0 21524 1073   556  0  5 43 51
     0  3      0   5520   4608 976428    0    0     4 37924 1076   836  0  7 41 51
     0  2      0   4848   4632 976812    0    0    32 12308 1092    94  0  1 33 66

    Simple fix: go back to scanning the zones in the dma->normal->highmem
    direction so we meet the page allocator in the middle somewhere.

     r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy wa id
     1  3      0   5152   3468 976548    0    0     4 37924 1071   650  0  8 64 28
     1  2      0   4888   3496 976588    0    0     0 23576 1075   726  0  6 66 27
     0  3      0   5336   3532 976348    0    0     0 31264 1072   708  0  8 60 32
     0  3      0   6168   3560 975504    0    0     0 40992 1072   683  0  6 63 31
     0  3      0   4560   3580 976844    0    0     0 18448 1073   233  0  4 59 37
     0  3      0   5840   3624 975712    0    0     4 26660 1072   800  1  8 46 45
     0  3      0   4816   3648 976640    0    0     0 40992 1073   526  0  6 47 47
     0  3      0   5456   3672 976072    0    0     0 19984 1070   320  0  5 60 35

    BKrev: 4051e448CiuO4KIoyJ6pqIVrkhuNnw
--------------------------------------------------------

At that time, kswapd didn't check memory contenious at all.
it has following code.

		------------------------------------------------------------
+                               if (zone->free_pages <= zone->pages_high) {
+                                       end_zone = i;
+                                       goto scan;
+                               }
		-----------------------------------------------------------------



2nd commit improve memory coutenious check.

--------------------------------------------------------
commit e0e1723229b6f96922d10bb932f94d899132b462
Author: nickpiggin <nickpiggin>
Date:   Tue Jan 4 04:14:42 2005 +0000

    [PATCH] mm: teach kswapd about higher order areas

    Teach kswapd to free memory on behalf of higher order allocators.  This
    could be important for higher order atomic allocations because they
    otherwise have no means to free the memory themselves.

    Signed-off-by: Nick Piggin <nickpiggin@...oo.com.au>
    Signed-off-by: Andrew Morton <akpm@...l.org>
    Signed-off-by: Linus Torvalds <torvalds@...l.org>

    BKrev: 41da1832E5flzqtNXq5m70WxihpcMw
--------------------------------------------------------

At that time, kswapd has following code.

		--------------------------------------------------------
-                               if (zone->free_pages <= zone->pages_high) {
+                               if (!zone_watermark_ok(zone, order,
+                                               zone->pages_high, 0, 0, 0)) {
                                        end_zone = i;
                                        goto scan;
                                }
		--------------------------------------------------------

The problem is, alloc_pages(GFP_KERNEL, 10) need to contenious order-10 memory.
but doesn't need to highmem couteniously.

However alloc_pages() pass to order==10 information.
but doesn't pass to highmem coutinuous is unnecessary.

Oops, that is bug, I think.


So, I'd like to fix this bug.
However, I check my guessing is right or not at first.
please reproduce program.



> Details:
> I build an instrumented kernel with debug messages in
> "zone_watermark_ok" function, and from the code and debug messages I
> see that "zone_watermark_ok" returns 0 when kswapd invokes it (through
> balance_pgdat) in order to decide if zone highmem is balanced or not,
> which lead in some configurations to infinite loop of kswapd ( if no
> large chunks of highmem released) . I added a condition to
> "balance_pgdat" so it doesn't try to balance order higher than 1 in
> zone highmem and this conditon solved the problem, what are the risks
> with such solution? isn't it a bug that kswapd is looking for
> continuous memory in zone highmem ( as I understand there is no 1-1
> mapping in zone highmem which is meaningless in kswapd)?


simple removing seems no good.
because hugepage on highmem need to highmem coutenious.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ