lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 28 Oct 2014 09:25:35 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Will Deacon <will.deacon@....com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Russell King - ARM Linux <linux@....linux.org.uk>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>
Subject: Re: [RFC PATCH 1/2] zap_pte_range: update addr when forcing flush
 after TLB batching faiure

On Tue, Oct 28, 2014 at 9:07 AM, Will Deacon <will.deacon@....com> wrote:
>
> Ok, that's useful, thanks. Out of curiosity, what *is* the current intention
> of __tlb_remove_tlb_entry, if start/end shouldn't be touched by
> architectures? Is it just for the PPC hash thing?

I think it's both the PPC hash, and for "legacy reasons" (ie
architectures that don't use the generic code, and were converted from
the "invalidate as you walk the tables" without ever really fixing the
"you have to flush the TLB before you free the page, and do
batching").

It would be lovely if we could just drop it entirely, although
changing it to actively do the minimal range is fine too.

> I was certainly seeing this issue trigger regularly when running firefox,
> but I'll need to dig and find out the differences in range size.

I'm wondering whether that was perhaps because of the mix-up with
initialization of the range. Afaik, that would always break your
min/max thing for the first batch (and since the batches are fairly
large, "first" may be "only")

But hey. it's possible that firefox does some big mappings but only
populates the beginning. Most architectures don't tend to have
excessive glass jaws in this area: invalidating things page-by-page is
invariably so slow that at some point you just go "just do the whole
range".

> Since we have hardware broadcasting of TLB invalidations on ARM, it is
> in our interest to keep the number of outstanding operations as small as
> possible, particularly on large systems where we don't get the targetted
> shootdown with a single message that you can perform using IPIs (i.e.
> you can only broadcast to all or no CPUs, and that happens for each pte).

Do you seriously *have* to broadcast for each pte?

Because that is quite frankly moronic.  We batch things up in software
for a real good reason: doing things one entry at a time just cannot
ever scale. At some point (and that point is usually not even very far
away), it's much better to do a single invalidate over a range. The
cost of having to refill the TLB's is *much* smaller than the cost of
doing tons of cross-CPU invalidates.

That's true even for the cases where we track the CPU's involved in
that mapping, and only invalidate a small subset. With a "all CPU's
broadcast", the cross-over point must be even smaller. Doing thousands
of CPU broadcasts is just crazy, even if they are hw-accelerated.

Can't you just do a full invalidate and a SW IPI for larger ranges?

And as mentioned, true sparse mappings are actually fairly rare, so
making extra effort (and data structures) to have individual ranges
sounds crazy.

Is this some hw-enforced thing? You really can't turn off the
cross-cpu-for-each-pte braindamage?

                         Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ