linux-kernel - Re: [PATCH RFC 04/15] KVM: Implement ring-based dirty memory tracking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <affd9d84-1b84-0c25-c431-a075c58c33dc@redhat.com>
Date:   Tue, 10 Dec 2019 11:07:31 +0100
From:   Paolo Bonzini <pbonzini@...hat.com>
To:     Peter Xu <peterx@...hat.com>
Cc:     Sean Christopherson <sean.j.christopherson@...el.com>,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        "Dr . David Alan Gilbert" <dgilbert@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>
Subject: Re: [PATCH RFC 04/15] KVM: Implement ring-based dirty memory tracking

On 09/12/19 22:54, Peter Xu wrote:
> Just until recently I noticed that actually kvm_get_running_vcpu() has
> a real benefit in that it gives a very solid result on whether we're
> with the vcpu context, even more accurate than when we pass vcpu
> pointers around (because sometimes we just passed the kvm pointer
> along the stack even if we're with a vcpu context, just like what we
> did with mark_page_dirty_in_slot).

Right, that's the point.

> I'm thinking whether I can start
> to use this information in the next post on solving an issue I
> encountered with the waitqueue.
> 
> Current waitqueue is still problematic in that it could wait even with
> the mmu lock held when with vcpu context.

I think the idea of the soft limit is that the waiting just cannot
happen.  That is, the number of dirtied pages _outside_ the guest (guest
accesses are taken care of by PML, and are subtracted from the soft
limit) cannot exceed hard_limit - (soft_limit + pml_size).

> The issue is KVM_RESET_DIRTY_RINGS needs the mmu lock to manipulate
> the write bits, while it's the only interface to also wake up the
> dirty ring sleepers.  They could dead lock like this:
> 
>       main thread                            vcpu thread
>       ===========                            ===========
>                                              kvm page fault
>                                                mark_page_dirty_in_slot
>                                                mmu lock taken
>                                                mark dirty, ring full
>                                                queue on waitqueue
>                                                (with mmu lock)
>       KVM_RESET_DIRTY_RINGS
>         take mmu lock               <------------ deadlock here
>         reset ring gfns
>         wakeup dirty ring sleepers
> 
> And if we see if the mark_page_dirty_in_slot() is not with a vcpu
> context (e.g. kvm_mmu_page_fault) but with an ioctl context (those
> cases we'll use per-vm dirty ring) then it's probably fine.
> 
> My planned solution:
> 
> - When kvm_get_running_vcpu() != NULL, we postpone the waitqueue waits
>   until we finished handling this page fault, probably in somewhere
>   around vcpu_enter_guest, so that we can do wait_event() after the
>   mmu lock released

I think this can cause a race:

	vCPU 1			vCPU 2		host
	---------------------------------------------------------------
	mark page dirty
				write to page
						treat page as not dirty
	add page to ring

where vCPU 2 skips the clean-page slow path entirely.

Paolo