linux-kernel - Re: [RFC PATCH 0/4] KVM: x86/mmu: Rework marking folios dirty/accessed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Zg3V-M3iospVUEDU@google.com>
Date: Wed, 3 Apr 2024 15:19:36 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: David Hildenbrand <david@...hat.com>
Cc: David Matlack <dmatlack@...gle.com>, Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org, David Stevens <stevensd@...omium.org>, 
	Matthew Wilcox <willy@...radead.org>
Subject: Re: [RFC PATCH 0/4] KVM: x86/mmu: Rework marking folios dirty/accessed

On Wed, Apr 03, 2024, David Hildenbrand wrote:
> On 03.04.24 02:17, Sean Christopherson wrote:
> > On Tue, Apr 02, 2024, David Hildenbrand wrote:
> > Aha!  But try_to_unmap_one() also checks that refcount==mapcount+1, i.e. will
> > also keep the folio if it has been GUP'd.  And __remove_mapping() explicitly states
> > that it needs to play nice with a GUP'd page being marked dirty before the
> > reference is dropped.
> 
> > 
> > 	 * Must be careful with the order of the tests. When someone has
> > 	 * a ref to the folio, it may be possible that they dirty it then
> > 	 * drop the reference. So if the dirty flag is tested before the
> > 	 * refcount here, then the following race may occur:
> > 
> > So while it's totally possible for KVM to get a W=1,D=0 PTE, if I'm reading the
> > code correctly it's safe/legal so long as KVM either (a) marks the folio dirty
> > while holding a reference or (b) marks the folio dirty before returning from its
> > mmu_notifier_invalidate_range_start() hook, *AND* obviously if KVM drops its
> > mappings in response to mmu_notifier_invalidate_range_start().
> > 
> 
> Yes, I agree that it should work in the context of vmscan. But (b) is
> certainly a bit harder to swallow than "ordinary" (a) :)

Heh, all the more reason to switch KVM x86 from (b) => (a).

> As raised, if having a writable SPTE would imply having a writable+dirty
> PTE, then KVM MMU code wouldn't have to worry about syncing any dirty bits
> ever back to core-mm, so patch #2 would not be required. ... well, it would
> be replaces by an MMU notifier that notifies about clearing the PTE dirty
> bit :)

Hmm, we essentially already have an mmu_notifier today, since secondary MMUs need
to be invalidated before consuming dirty status.  Isn't the end result essentially
a sane FOLL_TOUCH?

> ... because, then, there is also a subtle difference between
> folio_set_dirty() and folio_mark_dirty(), and I am still confused about the
> difference and not competent enough to explain the difference ... and KVM
> always does the former, while zapping code of pagecache folios does the
> latter ... hm

Ugh, just when I thought I finally had my head wrapped around this.

> Related note: IIRC, we usually expect most anon folios to be dirty.
> 
> kvm_set_pfn_dirty()->kvm_set_page_dirty() does an unconditional
> SetPageDirty()->folio_set_dirty(). Doing a test-before-set might frequently
> avoid atomic ops.

Noted, definitely worth poking at.