linux-kernel - Re: [patch] my mmu notifiers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080220005206.GP7128@v2.random>
Date:	Wed, 20 Feb 2008 01:52:06 +0100
From:	Andrea Arcangeli <andrea@...ranet.com>
To:	Nick Piggin <npiggin@...e.de>
Cc:	Jack Steiner <steiner@....com>, akpm@...ux-foundation.org,
	Robin Holt <holt@....com>, Avi Kivity <avi@...ranet.com>,
	Izik Eidus <izike@...ranet.com>,
	kvm-devel@...ts.sourceforge.net,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	general@...ts.openfabrics.org,
	Steve Wise <swise@...ngridcomputing.com>,
	Roland Dreier <rdreier@...co.com>,
	Kanoj Sarcar <kanojsarcar@...oo.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	daniel.blueman@...drics.com, Christoph Lameter <clameter@....com>
Subject: Re: [patch] my mmu notifiers

On Wed, Feb 20, 2008 at 12:04:27AM +0100, Nick Piggin wrote:
> On Tue, Feb 19, 2008 at 08:27:25AM -0600, Jack Steiner wrote:
> > > On Tue, Feb 19, 2008 at 02:58:51PM +0100, Andrea Arcangeli wrote:
> > > > understand the need for invalidate_begin/invalidate_end pairs at all.
> > > 
> > > The need of the pairs is crystal clear to me: range_begin is needed
> > > for GRU _but_only_if_ range_end is called after releasing the
> > > reference that the VM holds on the page. _begin will flush the GRU tlb
> > > and at the same time it will take a mutex that will block further GRU
> > > tlb-miss-interrupts (no idea how they manange those nightmare locking,
> > > I didn't even try to add more locking to KVM and I get away with the
> > > fact KVM takes the pin on the page itself).
> > 
> > As it turns out, no actual mutex is required. _begin_ simply increments a
> > count of active range invalidates, _end_ decrements the count. New TLB
> > dropins are deferred while range callouts are active.
> > 
> > This would appear to be racy but the GRU has special hardware that
> > simplifies locking. When the GRU sees a TLB invalidate, all outstanding
> > misses & potentially inflight TLB dropins are marked by the GRU with a
> > "kill" bit. When the dropin finally occurs, the dropin is ignored & the
> > instruction is simply restarted. The instruction will fault again & the TLB
> > dropin will be repeated.  This is optimized for the case where invalidates
> > are rare - true for users of the GRU.
> 
> OK (thanks to Robin as well). Now I understand why you are using it,
> but I don't understand why you don't defer new TLBs after the point
> where the linux pte changes. If you can do that, then you look and
> act much more like a TLB from the point of view of the Linux vm.

Christoph was forced to put the invalidate_range callback _after_
dropping the PT lock because xpmem has to wait I/O there. But
invalidate_range is called after freeing the VM reference on the pages
so then GRU needed a _range_begin too because GRU has to flush the tlb
before the VM reference on the page is released (xpmem and KVM pin the
pages mapped by the secondary mmu, GRU doesn't). So then
invalidate_range was renamed to invalidate_range_end.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/