linux-kernel - Re: [PATCH 03/16] KVM: x86: set gfn-to-pfn cache length consistently with VM word size

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c61f6089-57b7-e00f-d5ed-68e62237eab0@redhat.com>
Date:   Fri, 28 Oct 2022 13:36:20 +0200
From:   Paolo Bonzini <pbonzini@...hat.com>
To:     Sean Christopherson <seanjc@...gle.com>,
        "Woodhouse, David" <dwmw@...zon.co.uk>
Cc:     linux-kernel@...r.kernel.org, kvm@...r.kernel.org, mhal@...x.co
Subject: Re: [PATCH 03/16] KVM: x86: set gfn-to-pfn cache length consistently
 with VM word size

On 10/27/22 19:22, Sean Christopherson wrote:
> On Thu, Oct 27, 2022, Paolo Bonzini wrote:
>> So, use the short size at activation time as well.  This means
>> re-activating the cache if the guest requests the hypercall page
>> multiple times with different word sizes (this can happen when
>> kexec-ing, for example).
> 
> I don't understand the motivation for allowing a conditionally valid GPA.  I see
> a lot of complexity and sub-optimal error handling for a use case that no one
> cares about.  Realistically, userspace is never going to provide a GPA that only
> works some of the time, because doing otherwise is just asking for a dead guest.

We _should_ be following the Xen API, which does not even say that the
areas have to fit in a single page.  In fact, even Linux's

         struct vcpu_register_runstate_memory_area area;

         area.addr.v = &per_cpu(xen_runstate, cpu);
         if (HYPERVISOR_vcpu_op(VCPUOP_register_runstate_memory_area,
                                xen_vcpu_nr(cpu), &area))

could fail or not just depending on the linker's whims, if I'm not
very confused.

Other data structures *do* have to fit in a page, but the runstate area
does not and it's exactly the one where the cache comes the most handy.
For this I'm going to wait for David to answer.

That said, the whole gpc API is really messy and needs to be cleaned
up beyond what this series does.  For example,

         read_lock_irqsave(&gpc->lock, flags);
         while (!kvm_gfn_to_pfn_cache_check(v->kvm, gpc, gpc->gpa,
                                            sizeof(x))) {
                 read_unlock_irqrestore(&gpc->lock, flags);

                 if (kvm_gfn_to_pfn_cache_refresh(v->kvm, gpc, gpc->gpa, sizeof(x)))
                         return;

                 read_lock_irqsave(&gpc->lock, flags);
         }
	...
         read_unlock_irqrestore(&gpc->lock, flags);

should really be simplified to

	khva = kvm_gpc_lock(gpc);
	if (IS_ERR(khva))
		return;
	...
	kvm_gpc_unlock(gpc);

Only the special preempt-notifier cases would have to use check/refresh
by hand.  If needed they could even pass whatever length they want to
__kvm_gpc_refresh with, explicit marking that it's a here-be-dragons __API.

Also because we're using the gpc from non-preemptible regions the rwlock
critical sections should be enclosed in preempt_disable()/preempt_enable().
Fortunately they're pretty small.

For now I think the best course of action is to quickly get the bugfix
patches to Linus, and for 6.2 drop this one but otherwise keep the length
in kvm_gpc_activate().

> Aren't we just making up behavior at this point?  Injecting a #GP into the guest
> for what is a completely legal operation from the guest's perspective seems all
> kinds of wrong.

Yeah, it is and this patch was a steaming pile...  Though, while this
one is particularly egregious because it comes from an MSR write, a lot
of error returns in xen.c are inviting userspace to make up error
conditions that aren't there in Xen.  In fact activate should probably
ignore a refresh EFAULT altogether.

Paolo