lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 11 Dec 2013 01:02:24 +0000 From: Mel Gorman <mgorman@...e.de> To: Thomas Gleixner <tglx@...utronix.de> Cc: Linus Torvalds <torvalds@...ux-foundation.org>, Dave Jones <davej@...hat.com>, Darren Hart <dvhart@...ux.intel.com>, Andrea Arcangeli <aarcange@...hat.com>, Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, Peter Zijlstra <peterz@...radead.org> Subject: Re: process 'stuck' at exit. On Tue, Dec 10, 2013 at 08:18:29PM +0100, Thomas Gleixner wrote: > On Tue, 10 Dec 2013, Linus Torvalds wrote: > > > Hmm. Looks like the futex code is somehow stuck in a loop, calling > > get_user_pages_fast(). > > > > The futex code itself is apparently so low-overhead that it doesn't > > show up in your 'perf top' report (which is dominated by all the > > expensive debug things that get_user_pages_fast() etc ends up doing), > > but that's the only looping I can see. Perhaps the "goto again" case > > for transparent huge pages in get_futex_key()? Or the > > Cc'ng more folks on that. > I just saw this before heading to bed and have not read the thread. I'll read it in the morning but in the meantime the following might ring a bell for someone elses investigation or someone more familiar with how futexs work from end to end. Was NUMA balancing enabled and was this a NUMA machine? I ask because of these two patches that are currently in flight mm: numa: Serialise parallel get_user_page against THP migration mm fix TLB flush race between migration, and change_protection_range There are related patches but these two are the most important for what I have in mind. The two in combination address a problem whereby a write from one thread can be lost due to a THP migration but it's specific to automatic NUMA balancing. If the lost update was for a page containing a futex then the lost write could confuse waiters. The downside is that this is a bad fit for the problem description in the first mail. A lost update might result in processes waiting forever on a value that never changes but offhand it's less clear why it might result in a loop. Unless of course there is a combination of events that allows for a busy wait on a value that will never change due to the lost write. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists