linux-kernel - Re: [PATCH 0/3] TLB flush multiple pages per IPI v5

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFz6b5pG9tRNazk8ynTCXS3whzWJ_737dt1xxAHDf1jASQ@mail.gmail.com>
Date:	Tue, 9 Jun 2015 14:54:01 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Dave Hansen <dave.hansen@...el.com>
Cc:	Ingo Molnar <mingo@...nel.org>, Mel Gorman <mgorman@...e.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Rik van Riel <riel@...hat.com>,
	Hugh Dickins <hughd@...gle.com>,
	Minchan Kim <minchan@...nel.org>,
	Andi Kleen <andi@...stfloor.org>,
	H Peter Anvin <hpa@...or.com>, Linux-MM <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 0/3] TLB flush multiple pages per IPI v5

On Tue, Jun 9, 2015 at 2:14 PM, Dave Hansen <dave.hansen@...el.com> wrote:
>
> The 0 cycle TLB miss was also interesting.  It goes back up to something
> reasonable if I put the mb()/mfence's back.

So I've said it before, and I'll say it again: Intel does really well
on TLB fills.

The reason is partly historical, with Win95 doing a ton of TLB
invalidation (I think every single GDI call ended up invalidating the
TLB, so under some important Windows benchmarks of the time, you
literally had a TLB flush every 10k instructions!).

But partly it is because people are wrong in thinking that TLB fills
have to be slow. There's a lot of complete garbage RISC machines where
the TLB fill took forever, because in the name of simplicity it would
stop the pipeline and often be done in SW.

The zero-cycle TLB fill is obviously a bit optimistic, but at the same
time it's not completely insane. TLB fills can be prefetched, and the
table walker can hit the cache, if you do them right. And Intel mostly
does.

So the normal full (non-global) TLB fill really is fairly cheap. It's
been optimized for, and the TLB gets re-filled fairly efficiently. I
suspect that it's really the case that doing more than just a couple
of single-tlb flushes is a complete waste of time: the flushing takes
longer than re-filling the TLB well.

                         Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/