lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1211131507370.17623@chino.kir.corp.google.com>
Date:	Tue, 13 Nov 2012 15:11:44 -0800 (PST)
From:	David Rientjes <rientjes@...gle.com>
To:	Andy Lutomirski <luto@...capital.net>
cc:	Marc Duponcheel <marc@...line.be>, Mel Gorman <mgorman@...e.de>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [3.6 regression?] THP + migration/compaction livelock (I
 think)

On Tue, 13 Nov 2012, Andy Lutomirski wrote:

> I've seen an odd problem three times in the past two weeks.  I suspect
> a Linux 3.6 regression.  I"m on 3.6.3-1.fc17.x86_64.  I run a parallel
> compilation, and no progress is made.  All cpus are pegged at 100%
> system time by the respective cc1plus processes.  Reading
> /proc/<pid>/stack shows either
> 
> [<ffffffff8108e01a>] __cond_resched+0x2a/0x40
> [<ffffffff8114e432>] isolate_migratepages_range+0xb2/0x620
> [<ffffffff8114eba4>] compact_zone+0x144/0x410
> [<ffffffff8114f152>] compact_zone_order+0x82/0xc0
> [<ffffffff8114f271>] try_to_compact_pages+0xe1/0x130
> [<ffffffff816143db>] __alloc_pages_direct_compact+0xaa/0x190
> [<ffffffff81133d26>] __alloc_pages_nodemask+0x526/0x990
> [<ffffffff81171496>] alloc_pages_vma+0xb6/0x190
> [<ffffffff81182683>] do_huge_pmd_anonymous_page+0x143/0x340
> [<ffffffff811549fd>] handle_mm_fault+0x27d/0x320
> [<ffffffff81620adc>] do_page_fault+0x15c/0x4b0
> [<ffffffff8161d625>] page_fault+0x25/0x30
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> or
> 
> [<ffffffffffffffff>] 0xffffffffffffffff
> 

This reminds me of the thread at http://marc.info/?t=135102111800004 which 
caused Marc's system to reportedly go unresponsive like your report but in 
his case it also caused a reboot.  If your system is still running (or, 
even better, if you're able to capture this happening in realtime), could 
you try to capture

	grep -E "compact_|thp_" /proc/vmstat

as well while it is in progress?  (Even if it's not happening right now, 
the data might still be useful if you have knowledge that it has occurred 
since the last reboot.)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ