linux-kernel - Re: [PATCH] fix count_vm_event preempt in memory compaction direct reclaim

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 5 May 2010 14:55:38 +0100
From:	Mel Gorman <mel@....ul.ie>
To:	Andrea Arcangeli <aarcange@...hat.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Christoph Lameter <cl@...ux-foundation.org>,
	Adam Litke <agl@...ibm.com>, Avi Kivity <avi@...hat.com>,
	David Rientjes <rientjes@...gle.com>,
	Minchan Kim <minchan.kim@...il.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Rik van Riel <riel@...hat.com>, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org
Subject: Re: [PATCH] fix count_vm_event preempt in memory compaction direct
	reclaim

On Wed, May 05, 2010 at 03:11:12PM +0200, Andrea Arcangeli wrote:
> On Wed, May 05, 2010 at 01:51:56PM +0100, Mel Gorman wrote:
> > On Wed, May 05, 2010 at 02:19:08PM +0200, Andrea Arcangeli wrote:
> > > On Tue, Apr 20, 2010 at 10:01:14PM +0100, Mel Gorman wrote:
> > > > +		if (page) {
> > > > +			__count_vm_event(COMPACTSUCCESS);
> > > > +			return page;
> > > 
> > > ==
> > > From: Andrea Arcangeli <aarcange@...hat.com>
> > > 
> > > Preempt is enabled so it must use count_vm_event.
> > > 
> > > Signed-off-by: Andrea Arcangeli <aarcange@...hat.com>
> > 
> > Reviewed-by: Mel Gorman <mel@....ul.ie>
> > 
> > Andrew, this is a fix to the patch
> > mmcompaction-direct-compact-when-a-high-order-allocation-fails.patch
> 
> for Andrew: I'll generate a trivial reject to the exponential backoff.
> 
> > Thanks Andrea, well spotted.
> 
> You're welcome.
> 
> I updated current aa.git origin/master and origin/anon_vma_chain
> branches (post THP-23*).
> 

Ok.

> There's also another patch I've in my tree that you didn't picked up
> and I wonder what's the issue here.

Simple, I didn't spot it. If you pointed it out to me, I didn't take
note of it and it got lost. Sorry if you did.

> This less a bugfix because it
> seems to only affect lockdep, I don't know why lockdep forbids to call
> migrate_prep with any lock held (in this case the mmap_sem).

I haven't seen this problem. The testing I'd have been doing with compaction
were stress tests allocating huge pages but not from the fault path.

> migrate.c
> is careful to comply with it, compaction.c isn't. It's not mandatory
> to succeed for compaction, so in doubt I just commented it out.

It's not mandatory but the LRU lists should be drained so they can be properly
isolated. It'd make a slight difference to success rates as there will be
pages that cannot be isolated because they are on some pagevec.

> It'll
> also decrease the IPI load so I wasn't very concerned to re-enable it.
> 

While true, is compaction density that high under normal workloads? I guess
it would be if a scanner was constantly trying to promote pages.  If the
IPI load is out of hand, I'm ok with disabling in some cases. For example,
I'd be ok with it being skipped if it was part of a daemon doing speculative
promotion but I'd prefer it to still be used if the static hugetlbfs pool
was being resized if that was possible.

> -----
> Subject: disable migrate_prep()
> 
> From: Andrea Arcangeli <aarcange@...hat.com>
> 
> I get trouble from lockdep if I leave it enabled:
> 
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.34-rc3 #50
> -------------------------------------------------------
> largepages/4965 is trying to acquire lock:
>  (events){+.+.+.}, at: [<ffffffff8105b788>] flush_work+0x38/0x130
> 
>  but task is already holding lock:
>   (&mm->mmap_sem){++++++}, at: [<ffffffff8141b022>] do_page_fault+0xd2/0x430
> 

Hmm, I'm not seeing where in the fault path flush_work is getting called
from. Can you point it out to me please?

We already do some IPI work in the page allocator although it happens after
direct reclaim and only for high-order pages. What happens there and what
happens in migrate_prep are very similar so if there was a problem with IPI
and fault paths, I'd have expected to see it from hugetlbfs at some stage.

> flush_work apparently wants to run free from lock and it bugs in:
> 
> 	lock_map_acquire(&cwq->wq->lockdep_map);
> 
> Signed-off-by: Andrea Arcangeli <aarcange@...hat.com>
> ---
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -383,7 +383,9 @@ static int compact_zone(struct zone *zon
>  	cc->free_pfn = cc->migrate_pfn + zone->spanned_pages;
>  	cc->free_pfn &= ~(pageblock_nr_pages-1);
>  
> +#if 0
>  	migrate_prep();
> +#endif
>  
>  	while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
>  		unsigned long nr_migrate, nr_remaining;
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/