linux-kernel - Re: [3.7-rc6] capture_free_page() frees page without accounting for them??

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20121121143303.GD8218@suse.de>
Date:	Wed, 21 Nov 2012 14:33:03 +0000
From:	Mel Gorman <mgorman@...e.de>
To:	Dave Hansen <dave@...ux.vnet.ibm.com>
Cc:	linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [3.7-rc6] capture_free_page() frees page without accounting for
 them??

On Tue, Nov 20, 2012 at 12:25:37PM -0800, Dave Hansen wrote:
> Hi Mel,
> 
> I'm chasing an apparent memory leak introduced post-3.6. 

An accounting leak could also contribute to the kswapd bugs we've been
seeing recently.

Andrew, this is quite important and might be worth wedging in before 3.7
comes out because it'll cause serious problems if Dave is right.

> The
> interesting thing is that it appears that the pages are in the
> allocator, but not being accounted for:
> 
> 	http://www.spinics.net/lists/linux-mm/msg46187.html
> 	https://bugzilla.kernel.org/show_bug.cgi?id=50181
> 

Differences in the buddy allocator and reported free figures almost
always point to either per-cpu drift or NR_FREE_PAGES accounting bugs.
Usually the drift is not too bad and the drift is always within a
margin related to the number of CPUs. NR_FREE_PAGES accounting bugs get
progressively worse until the machine starts OOM killing or locks up.

> I started auditing anything that might be messing with NR_FREE_PAGES,
> and came across commit 1fb3f8ca. 

It could certainly affect NR_FREE_PAGES due to its manipulating of buddy
pages. It will not result in happy and the system would potentially need
to be running a long time before it's spotted.

> It does something curious with
> capture_free_page() (previously known as split_free_page()).
> 
> int capture_free_page(struct page *page, int alloc_order,
> ...
>         __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
> 
> -       /* Split into individual pages */
> -       set_page_refcounted(page);
> -       split_page(page, order);
> +       if (alloc_order != order)
> +               expand(zone, page, alloc_order, order,
> +                       &zone->free_area[order], migratetype);
> 
> Note that expand() puts the pages _back_ in the allocator, but it does
> not bump NR_FREE_PAGES.  We "return" alloc_order' worth of pages, but we
> accounted for removing 'order'.
> 
> I _think_ the correct fix is to just:
> 
> -     __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
> +     __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << alloc_order));
> 

This looks correct to me but it will collide with other patches. You'll
need something like the below. If it works for you, stick a changelog on
it, feel free to put my Acked on it and get it to Andrew for ASAP
because I really think this needs to be in before 3.7 comes out or we'll
be swamped with a maze of kswapd-goes-mental bugs, all similar with
different root causes.

Thanks a million Dave!

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fd6a073..ad99f0f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1406,7 +1406,7 @@ int capture_free_page(struct page *page, int alloc_order, int migratetype)

 	mt = get_pageblock_migratetype(page);
 	if (unlikely(mt != MIGRATE_ISOLATE))
-		__mod_zone_freepage_state(zone, -(1UL << order), mt);
+		__mod_zone_freepage_state(zone, -(1UL << alloc_order), mt);

 	if (alloc_order != order)
 		expand(zone, page, alloc_order, order,

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/