lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110530153748.GS5044@csn.ul.ie>
Date:	Mon, 30 May 2011 16:37:49 +0100
From:	Mel Gorman <mel@....ul.ie>
To:	Andrea Arcangeli <aarcange@...hat.com>
Cc:	akpm@...ux-foundation.org, Ury Stankevich <urykhy@...il.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org, stable@...nel.org
Subject: Re: [PATCH] mm: compaction: Abort compaction if too many pages are
 isolated and caller is asynchronous

On Mon, May 30, 2011 at 04:31:09PM +0200, Andrea Arcangeli wrote:
> Hi Mel and everyone,
> 
> On Mon, May 30, 2011 at 02:13:00PM +0100, Mel Gorman wrote:
> > Asynchronous compaction is used when promoting to huge pages. This is
> > all very nice but if there are a number of processes in compacting
> > memory, a large number of pages can be isolated. An "asynchronous"
> > process can stall for long periods of time as a result with a user
> > reporting that firefox can stall for 10s of seconds. This patch aborts
> > asynchronous compaction if too many pages are isolated as it's better to
> > fail a hugepage promotion than stall a process.
> > 
> > If accepted, this should also be considered for 2.6.39-stable. It should
> > also be considered for 2.6.38-stable but ideally [11bc82d6: mm:
> > compaction: Use async migration for __GFP_NO_KSWAPD and enforce no
> > writeback] would be applied to 2.6.38 before consideration.
> 
> Is this supposed to fix the stall with khugepaged in D state and other
> processes in D state?
> 

Other processes. khugepaged might be getting stuck in the same loop but
I do not have a specific case in mind.

> zoneinfo showed a nr_isolated_file = -1, I don't think that meant
> compaction had 4g pages isolated really considering it moves from
> -1,0, 1. So I'm unsure if this fix could be right if the problem is
> the hang with khugepaged in D state reported, so far that looked more
> like a bug with PREEMPT in the vmstat accounting of nr_isolated_file
> that trips in too_many_isolated of both vmscan.c and compaction.c with
> PREEMPT=y. Or are you fixing a different problem?
> 

I'm not familiar with this problem. I either missed it or forgot about
it entirely. I was considering only Ury's report whereby firefox was
getting stalled for 10s of seconds in congestion_wait. It's possible the
root cause was isolated counters being broken but I didn't pick up on
it.

> Or how do you explain this -1 value out of nr_isolated_file? Clearly
> when that value goes to -1, compaction.c:too_many_isolated will hang,
> I think we should fix the -1 value before worrying about the rest...
> 
> grep nr_isolated_file zoneinfo-khugepaged 
>     nr_isolated_file 1
>     nr_isolated_file 4294967295

Can you point me at the thread that this file appears on and what the
conditions were? If vmstat is going to -1, it is indeed a problem
because it implies an imbalance in increments and decrements to the
isolated counters. Even with that fixed though, this patch still makes
sense as why would an asynchronous user of compaction stall on
congestion_wait?

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ