linux-kernel - Re: [PATCH 2/4] KVM: selftests: Setup ucall after loading program into guest memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y5I/xiFMLVbpAZj+@google.com>
Date:   Thu, 8 Dec 2022 11:49:26 -0800
From:   Ricardo Koller <ricarkol@...gle.com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     Oliver Upton <oliver.upton@...ux.dev>,
        Marc Zyngier <maz@...nel.org>,
        James Morse <james.morse@....com>,
        Alexandru Elisei <alexandru.elisei@....com>,
        Suzuki K Poulose <suzuki.poulose@....com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Shuah Khan <shuah@...nel.org>,
        linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.cs.columbia.edu,
        kvm@...r.kernel.org, kvmarm@...ts.linux.dev,
        linux-kselftest@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/4] KVM: selftests: Setup ucall after loading program
 into guest memory

On Thu, Dec 08, 2022 at 07:01:57PM +0000, Sean Christopherson wrote:
> On Thu, Dec 08, 2022, Ricardo Koller wrote:
> > On Thu, Dec 08, 2022 at 12:37:23AM +0000, Oliver Upton wrote:
> > > On Thu, Dec 08, 2022 at 12:24:20AM +0000, Sean Christopherson wrote:
> > > > > Even still, that's just a kludge to make ucalls work. We have other
> > > > > MMIO devices (GIC distributor, for example) that work by chance since
> > > > > nothing conflicts with the constant GPAs we've selected in the tests.
> > > > > 
> > > > > I'd rather we go down the route of having an address allocator for the
> > > > > for both the VA and PA spaces to provide carveouts at runtime.
> > > > 
> > > > Aren't those two separate issues?  The PA, a.k.a. memslots space, can be solved
> > > > by allocating a dedicated memslot, i.e. doesn't need a carve.  At worst, collisions
> > > > will yield very explicit asserts, which IMO is better than whatever might go wrong
> > > > with a carve out.
> > > 
> > > Perhaps the use of the term 'carveout' wasn't right here.
> > > 
> > > What I'm suggesting is we cannot rely on KVM memslots alone to act as an
> > > allocator for the PA space. KVM can provide devices to the guest that
> > > aren't represented as memslots. If we're trying to fix PA allocations
> > > anyway, why not make it generic enough to suit the needs of things
> > > beyond ucalls?
> > 
> > One extra bit of information: in arm, IO is any access to an address (within
> > bounds) not backed by a memslot. Not the same as x86 where MMIO are writes to
> > read-only memslots.  No idea what other arches do.
> 
> I don't think that's correct, doesn't this code turn write abort on a RO memslot
> into an io_mem_abort()?  Specifically, the "(write_fault && !writable)" check will
> match, and assuming none the the edge cases in the if-statement fire, KVM will
> send the write down io_mem_abort().

You are right. In fact, page_fault_test checks precisely that: writes on
RO memslots are sent to userspace as an mmio exit. I was just referring
to the MMIO done for ucall.

Having said that, we could use ucall as writes on read-only memslots
like what x86 does.

> 
> 	gfn = fault_ipa >> PAGE_SHIFT;
> 	memslot = gfn_to_memslot(vcpu->kvm, gfn);
> 	hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
> 	write_fault = kvm_is_write_fault(vcpu);
> 	if (kvm_is_error_hva(hva) || (write_fault && !writable)) {
> 		/*
> 		 * The guest has put either its instructions or its page-tables
> 		 * somewhere it shouldn't have. Userspace won't be able to do
> 		 * anything about this (there's no syndrome for a start), so
> 		 * re-inject the abort back into the guest.
> 		 */
> 		if (is_iabt) {
> 			ret = -ENOEXEC;
> 			goto out;
> 		}
> 
> 		if (kvm_vcpu_abt_iss1tw(vcpu)) {
> 			kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
> 			ret = 1;
> 			goto out_unlock;
> 		}
> 
> 		/*
> 		 * Check for a cache maintenance operation. Since we
> 		 * ended-up here, we know it is outside of any memory
> 		 * slot. But we can't find out if that is for a device,
> 		 * or if the guest is just being stupid. The only thing
> 		 * we know for sure is that this range cannot be cached.
> 		 *
> 		 * So let's assume that the guest is just being
> 		 * cautious, and skip the instruction.
> 		 */
> 		if (kvm_is_error_hva(hva) && kvm_vcpu_dabt_is_cm(vcpu)) {
> 			kvm_incr_pc(vcpu);
> 			ret = 1;
> 			goto out_unlock;
> 		}
> 
> 		/*
> 		 * The IPA is reported as [MAX:12], so we need to
> 		 * complement it with the bottom 12 bits from the
> 		 * faulting VA. This is always 12 bits, irrespective
> 		 * of the page size.
> 		 */
> 		fault_ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
> 		ret = io_mem_abort(vcpu, fault_ipa);
> 		goto out_unlock;
> 	}