lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1417705049.21214.3@mail.thefacebook.com>
Date:	Thu, 4 Dec 2014 09:57:29 -0500
From:	Chris Mason <clm@...com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
CC:	Thomas Gleixner <tglx@...utronix.de>,
	John Stultz <john.stultz@...aro.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Dave Jones <davej@...hat.com>,
	Mike Galbraith <umgwanakikbuti@...il.com>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Dâniel Fraga <fragabr@...il.com>,
	Sasha Levin <sasha.levin@...cle.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: frequent lockups in 3.18rc4



On Thu, Dec 4, 2014 at 12:49 AM, Linus Torvalds 
<torvalds@...ux-foundation.org> wrote:
> On Wed, Dec 3, 2014 at 7:15 PM, Chris Mason <clm@...com> wrote:
>> 
>>  One guess is that trinity is generating a huge number of tlb
>>  invalidations over sparse and horrible ranges.  Perhaps the old 
>> code was
>>  falling back to full tlb flushes before Dave Hansen's string of 
>> fixes?
> 
> Hmm. I agree that we've had some of the backtraces look like TLB
> flushing might be involved. Not all, though. And I'm not seeing where
> a loop over up to 33 pages should matter over doing a full TLB flush.
> 
> What *might* matter is if we somehow get that number wrong, and the 
> loops like
> 
>                         addr = f->flush_start;
>                         while (addr < f->flush_end) {
>                                 __flush_tlb_single(addr);
>                                 addr += PAGE_SIZE;
>                         }
> 
> ends up looping a *lot* due to some bug, and then the IPI itself would
> take so long that the watchdog could trigger.
> 
> But I do not see how that could actually happen. As far as I can tell,
> either the number of pages is limited to less than 33, or we have that
>  TLB_FLUSH_ALL case.
> 
> Do  you see something I don't?

Sadly not.  Looking harder, I'm pretty sure all of the flushes coming 
through from this path are single page flushes anyway.  So the most 
likely explanation is that we're waiting on the remote CPU, who is 
stuck somewhere secret.

-chris



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ