linux-kernel - Re: [PATCH] KVM swapping with MMU Notifiers V7

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080218121715.GR11732@v2.random>
Date:	Mon, 18 Feb 2008 13:17:15 +0100
From:	Andrea Arcangeli <andrea@...ranet.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Christoph Lameter <clameter@....com>, Robin Holt <holt@....com>,
	Avi Kivity <avi@...ranet.com>, Izik Eidus <izike@...ranet.com>,
	kvm-devel@...ts.sourceforge.net,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	general@...ts.openfabrics.org,
	Steve Wise <swise@...ngridcomputing.com>,
	Roland Dreier <rdreier@...co.com>,
	Kanoj Sarcar <kanojsarcar@...oo.com>, steiner@....com,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	daniel.blueman@...drics.com
Subject: Re: [PATCH] KVM swapping with MMU Notifiers V7

On Sat, Feb 16, 2008 at 03:08:17AM -0800, Andrew Morton wrote:
> On Sat, 16 Feb 2008 11:48:27 +0100 Andrea Arcangeli <andrea@...ranet.com> wrote:
> 
> > +void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn,
> > +					   struct mm_struct *mm,
> > +					   unsigned long start, unsigned long end,
> > +					   int lock)
> > +{
> > +	for (; start < end; start += PAGE_SIZE)
> > +		kvm_mmu_notifier_invalidate_page(mn, mm, start);
> > +}
> > +
> > +static const struct mmu_notifier_ops kvm_mmu_notifier_ops = {
> > +	.invalidate_page	= kvm_mmu_notifier_invalidate_page,
> > +	.age_page		= kvm_mmu_notifier_age_page,
> > +	.invalidate_range_end	= kvm_mmu_notifier_invalidate_range_end,
> > +};
> 
> So this doesn't implement ->invalidate_range_start().

Correct. range_start is needed by subsystems that don't pin the pages
(so they've to drop the secondary mmu mappings on the physical page
before the page is released by the linux VM).

> By what means does it prevent new mappings from being established in the
> range after core mm has tried to call ->invalidate_rande_start()?
> mmap_sem, I assume?

No, populate range only takes the mmap_sem in read mode and the kvm page
fault also is of course taking it only in read mode.

What makes it safe, is that invalidate_range_end is called _after_ the
linux pte is clear. The kvm page fault, if it triggers, it will call
into get_user_pages again to re-establish the linux pte _before_
establishing the spte.

It's the same reason why it's safe to flush the tlb after clearing the
linux pte. sptes are like a secondary tlb.

> > +			/* set userspace_addr atomically for kvm_hva_to_rmapp */
> > +			spin_lock(&kvm->mmu_lock);
> > +			memslot->userspace_addr = userspace_addr;
> > +			spin_unlock(&kvm->mmu_lock);
> 
> are you sure?  kvm_unmap_hva() and kvm_age_hva() read ->userspace_addr a
> single time and it doesn't immediately look like there's a need to take the
> lock here?

gcc will always write it with a movq but this is to be
C-specs-compliant and because this is by far not a performance
critical path I thought it was simpler than some other atomic move in
a single insn.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/