lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141204031553.GA20193@ret.masoncoding.com>
Date:	Wed, 3 Dec 2014 22:15:53 -0500
From:	Chris Mason <clm@...com>
To:	Thomas Gleixner <tglx@...utronix.de>
CC:	John Stultz <john.stultz@...aro.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Dave Jones <davej@...hat.com>,
	Mike Galbraith <umgwanakikbuti@...il.com>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Dâniel Fraga <fragabr@...il.com>,
	Sasha Levin <sasha.levin@...cle.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: frequent lockups in 3.18rc4

I asked Dave for his lockups from 3.17-rc1, and they were in the
flush_tlb code waiting for remote CPUs to finish flushing.  It feels
like that's a common theme, and there are a few commits there between
3.16 and 3.17.

One guess is that trinity is generating a huge number of tlb
invalidations over sparse and horrible ranges.  Perhaps the old code was
falling back to full tlb flushes before Dave Hansen's string of fixes?

commit a5102476a24bce364b74f1110005542a2c964103
Author: Dave Hansen <dave.hansen@...ux.intel.com>

    x86/mm: Set TLB flush tunable to sane value (33)

This entirely untested diff forces full tlb flushes on the remote CPUs.
It adds a few parens for good luck, but the nr_pages var is only sent to
ftrace, so it's not the bug we're looking for.

I'm only changing the flushes done on remote CPUs.  The local CPU is
still doing up to 33 fine grained flushes.  That may or may not be a
good idea, but my hand waiving only makes sense to me if we've got a
long string of fine grained flushes from tons of procs fanning out to
the remote CPUs.

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index ee61c36..72c4ff0 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -120,7 +120,7 @@ static void flush_tlb_func(void *info)
 		} else {
 			unsigned long addr;
 			unsigned long nr_pages =
-				f->flush_end - f->flush_start / PAGE_SIZE;
+				(f->flush_end - f->flush_start) / PAGE_SIZE;
 			addr = f->flush_start;
 			while (addr < f->flush_end) {
 				__flush_tlb_single(addr);
@@ -214,10 +214,8 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
 	}
 	trace_tlb_flush(TLB_LOCAL_MM_SHOOTDOWN, base_pages_to_flush);
 out:
-	if (base_pages_to_flush == TLB_FLUSH_ALL) {
-		start = 0UL;
-		end = TLB_FLUSH_ALL;
-	}
+	start = 0UL;
+	end = TLB_FLUSH_ALL;
 	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
 		flush_tlb_others(mm_cpumask(mm), mm, start, end);
 	preempt_enable();
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ