lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 13 Mar 2015 09:07:43 +0100
From:	Ingo Molnar <mingo@...nel.org>
To:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Michael Sullivan <sully@...lly.net>, lttng-dev@...ts.lttng.org,
	LKML <linux-kernel@...r.kernel.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: Alternative to signals/sys_membarrier() in liburcu


* Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:

> ----- Original Message -----
> > From: "Linus Torvalds" <torvalds@...ux-foundation.org>
> > To: "Mathieu Desnoyers" <mathieu.desnoyers@...icios.com>
> > Cc: "Michael Sullivan" <sully@...lly.net>, lttng-dev@...ts.lttng.org, "LKML" <linux-kernel@...r.kernel.org>, "Paul E.
> > McKenney" <paulmck@...ux.vnet.ibm.com>, "Peter Zijlstra" <peterz@...radead.org>, "Ingo Molnar" <mingo@...nel.org>,
> > "Thomas Gleixner" <tglx@...utronix.de>, "Steven Rostedt" <rostedt@...dmis.org>
> > Sent: Thursday, March 12, 2015 5:47:05 PM
> > Subject: Re: Alternative to signals/sys_membarrier() in liburcu
> > 
> > On Thu, Mar 12, 2015 at 1:53 PM, Mathieu Desnoyers
> > <mathieu.desnoyers@...icios.com> wrote:
> > >
> > > So the question as it stands appears to be: would you be comfortable
> > > having users abuse mprotect(), relying on its side-effect of issuing
> > > a smp_mb() on each targeted CPU for the TLB shootdown, as
> > > an effective implementation of process-wide memory barrier ?
> > 
> > Be *very* careful.
> > 
> > Just yesterday, in another thread (discussing the auto-numa TLB 
> > performance regression), we were discussing skipping the TLB 
> > invalidates entirely if the mprotect relaxes the protections.

We have such code already in mm/mprotect.c, introduced in:

  10c1045f28e8 mm: numa: avoid unnecessary TLB flushes when setting NUMA hinting entries

which does:

                                /* Avoid TLB flush if possible */
                                if (pte_protnone(oldpte))
                                        continue;

> > Because if you *used* to be read-only, and them mprotect() 
> > something so that it is read-write, there really is no need to 
> > send a TLB invalidate, at least on x86. You can just change the 
> > page tables, and *if* any entries are stale in the TLB they'll 
> > take a microfault on access and then just reload the TLB.
> > 
> > So mprotect() to a more permissive mode is not necessarily 
> > serializing.
> 
> The idea here is to always mprotect() to a more restrictive mode, 
> which should trigger the TLB shootdown.

So what happens if a CPU comes around that integrates TLB shootdown 
management into its cache coherency protocol? In such a case IPI 
traffic can be skipped: the memory bus messages take care of TLB 
flushes in most cases.

It's a natural optimization IMHO, because TLB flushes are conceptually 
pretty close to the synchronization mechanisms inherent in data cache 
coherency protocols:

This could be implemented for example by a CPU that knows about ptes 
and handles their modification differently: when a pte is modified it 
will broadcast a MESI invalidation message not just for the cacheline 
belonging to the pte's physical address, but also an 'invalidate TLB' 
MESI message for the pte value's page.

The TLB shootdown would either be guaranteed within the MESI 
transaction, or there would either be a deterministic timing 
guarantee, or some explicit synchronization mechanism (new 
instruction) to make sure the remote TLB(s) got shot down.

Every form of this would be way faster than sending interrupts. New 
OSs could support this by the hardware telling them in which cases the 
TLBs are 'auto-flushed', while old OSs would still be compatible by 
sending (now pointless) TLB shootdown IPIs.

So it's a relatively straightforward hardware optimization IMHO: 
assuming TLB flushes are considered important enough to complicate the 
cacheline state machine (which I think they currently aren't).

So in this case there's no interrupt and no other interruption of the 
remote CPU's flow of execution in any fashion that could advance the 
RCU state machine.

What do you think?

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ