linux-kernel - Re: [PATCH] mm: compaction: Abort compaction if too many pages are isolated and caller is asynchronous

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 31 May 2011 21:16:20 +0900
From:	Minchan Kim <minchan.kim@...il.com>
To:	Andrea Arcangeli <aarcange@...hat.com>
Cc:	Mel Gorman <mgorman@...e.de>, Mel Gorman <mel@....ul.ie>,
	akpm@...ux-foundation.org, Ury Stankevich <urykhy@...il.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org, stable@...nel.org
Subject: Re: [PATCH] mm: compaction: Abort compaction if too many pages are
 isolated and caller is asynchronous

Hi Andrea,

On Mon, May 30, 2011 at 07:53:34PM +0200, Andrea Arcangeli wrote:
> On Mon, May 30, 2011 at 05:55:46PM +0100, Mel Gorman wrote:
> > Even with drift issues, -1 there should be "impossible". Assuming this
> > is a zoneinfo file, that figure is based on global_page_state() which
> > looks like
> 
> The two cases reproducing this long hang in D state, had from SMP=n
> PREEMPT=y. Clearly not common config these days. Also it didn't seem
> apparent that any task was running in a code path that kept pages
> isolated.
> 
> > unsigned long, and callers are using unsigned long, is there any
> > possibility the "if (x < 0)" is being optimised out? If you aware
> 
> It was eliminated by cpp.
> 
> > of users reporting this problem (like the users in thread "iotop:
> > khugepaged at 99.99% (2.6.38.3)"), do you know if they had a particular
> > compiler in common?
> 
> I had no reason to worry about the compiler yet but that's always good
> idea to keep in mind. The thread were the bug is reported is the
> "iotop" one you mentioned, and there's a tarball attached to one of
> the last emails of the thread with the debug data I grepped. It was
> /proc/zoneinfo file yes. That's the file I asked when I noticed
> something had to be wrong with too_many_isolated and I expected either
> nr_isolated or nr_inactive going wrong, it turned out it was
> nr_isolated (apparently, I don't have full picture on the problem
> yet). I added you in CC to a few emails but you weren't in all
> replies.
> 
> The debug data you can find on lkml in this email: Message-Id:
> <201105232005.56840.johannes.hirte@....tu-ilmenau.de>.
> 
> The other relevant sysrq+t here http://pastebin.com/raw.php?i=VG28YRbi
> 
> better save the latter (I did) as I'm worried it has a timeout on it.
> 
> Your patch was for reports with CONFIG_SMP=y? I'd prefer to clear out
> this error before improving the too_many_isolated, in fact while
> reviewing this code I was not impressed by too_many_isolated. For
> vmscan.c if there's an huge nr_active* list and a tiny nr_inactive
> (like after a truncate of filebacked pages or munmap of anon memory)
> there's no reason to stall, it's better to go ahead and let it refile
> more active pages. The too_many_isolated in compaction.c looks a whole
> lot better than the vmscan.c one as that takes into account the active
> pages too... But I refrained to make any change in this area as I
> don't think the bug is in too_many_isolated itself.
> 
> I noticed the count[] array is unsigned int, but it looks ok
> (especially for 32bit ;) because the isolation is limited.
> 
> Both bugs were reported on 32bit x86 UP builds with PREEMPT=y. The
> stat accounting seem to use atomics on UP so irqs on off or
> PREEMPT=y/n shouldn't matter if the increment is 1 insn long (plus no
> irq code should ever mess with nr_isolated)... If it wasn't atomic and
> irqs or preempt aren't disabled it could be preempt. To avoid
> confusion: it's not proven that PREEMPT is related, it may be an
> accident both .config had it on. I'm also unsure why it moves from
> -1,0,1 I wouldn't expect a single page to be isolated like -1 pages to
> be isolated, it just looks weird...

I am not sure this is related to the problem you have seen.
If he used hwpoison by madivse, it is possible.
Anyway, we can see negative value by count mismatch in UP build.
Let's fix it.

>From 1d3ebce2e8aa79dcc912da16b7a8d0611b6f9f1a Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan.kim@...il.com>
Date: Tue, 31 May 2011 21:11:58 +0900
Subject: [PATCH] Fix page isolated count mismatch

If migration is failed, normally we call putback_lru_pages which
decreases NR_ISOLATE_[ANON|FILE].
It means we should increase NR_ISOLATE_[ANON|FILE] before calling
putback_lru_pages. But soft_offline_page dosn't it.

It can make NR_ISOLATE_[ANON|FILE] with negative value and in UP build,
zone_page_state will say huge isolated pages so too_many_isolated
functions be deceived completely. At last, some process stuck in D state
as it expect while loop ending with congestion_wait.
But it's never ending story.

If it is right, it would be -stable stuff.

Cc: Mel Gorman <mel@....ul.ie>
Cc: Andrea Arcangeli <aarcange@...hat.com>
Signed-off-by: Minchan Kim <minchan.kim@...il.com>
---
 mm/memory-failure.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 5c8f7e0..eac0ba5 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -52,6 +52,7 @@
 #include <linux/swapops.h>
 #include <linux/hugetlb.h>
 #include <linux/memory_hotplug.h>
+#include <linux/mm_inline.h>
 #include "internal.h"
 
 int sysctl_memory_failure_early_kill __read_mostly = 0;
@@ -1468,7 +1469,8 @@ int soft_offline_page(struct page *page, int flags)
 	put_page(page);
 	if (!ret) {
 		LIST_HEAD(pagelist);
-
+		inc_zone_page_state(page, NR_ISOLATED_ANON +
+					    page_is_file_cache(page));
 		list_add(&page->lru, &pagelist);
 		ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL,
 								0, true);
-- 
1.7.0.4

-- 
Kind regards
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/