[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <4A0428FC.8080304@novell.com>
Date: Fri, 08 May 2009 08:43:40 -0400
From: Gregory Haskins <ghaskins@...ell.com>
To: Marcelo Tosatti <mtosatti@...hat.com>
CC: Avi Kivity <avi@...hat.com>, Chris Wright <chrisw@...s-sol.org>,
Gregory Haskins <gregory.haskins@...il.com>,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
Anthony Liguori <anthony@...emonkey.ws>,
paulmck@...ux.vnet.ibm.com
Subject: Re: [RFC PATCH 0/3] generic hypercall support
Marcelo Tosatti wrote:
> On Fri, May 08, 2009 at 10:59:00AM +0300, Avi Kivity wrote:
>
>> Marcelo Tosatti wrote:
>>
>>> I think comparison is not entirely fair. You're using
>>> KVM_HC_VAPIC_POLL_IRQ ("null" hypercall) and the compiler optimizes that
>>> (on Intel) to only one register read:
>>>
>>> nr = kvm_register_read(vcpu, VCPU_REGS_RAX);
>>>
>>> Whereas in a real hypercall for (say) PIO you would need the address,
>>> size, direction and data.
>>>
>>>
>> Well, that's probably one of the reasons pio is slower, as the cpu has
>> to set these up, and the kernel has to read them.
>>
>>
>>> Also for PIO/MMIO you're adding this unoptimized lookup to the
>>> measurement:
>>>
>>> pio_dev = vcpu_find_pio_dev(vcpu, port, size, !in);
>>> if (pio_dev) {
>>> kernel_pio(pio_dev, vcpu, vcpu->arch.pio_data);
>>> complete_pio(vcpu); return 1;
>>> }
>>>
>>>
>> Since there are only one or two elements in the list, I don't see how it
>> could be optimized.
>>
>
> speaker_ioport, pit_ioport, pic_ioport and plus nulldev ioport. nulldev
> is probably the last in the io_bus list.
>
> Not sure if this one matters very much. Point is you should measure the
> exit time only, not the pio path vs hypercall path in kvm.
>
The problem is the exit time in of itself isnt all that interesting to
me. What I am interested in measuring is how long it takes KVM to
process the request and realize that I want to execute function "X".
Ultimately that is what matters in terms of execution latency and is
thus the more interesting data. I think the exit time is possibly an
interesting 5th data point, but its more of a side-bar IMO. In any
case, I suspect that both exits will be approximately the same at the
VT/SVM level.
OTOH: If there is a patch out there to improve KVMs code (say
specifically the PIO handling logic), that is fair-game here and we
should benchmark it. For instance, if you have ideas on ways to improve
the find_pio_dev performance, etc.... One item may be to replace the
kvm->lock on the bus scan with an RCU or something.... (though PIOs are
very frequent and the constant re-entry to an an RCU read-side CS may
effectively cause a perpetual grace-period and may be too prohibitive).
CC'ing pmck.
FWIW: the PIOoHCs were about 140ns slower than pure HC, so some of that
140 can possibly be recouped. I currently suspect the lock acquisition
in the iobus-scan is the bulk of that time, but that is admittedly a
guess. The remaining 200-250ns is elsewhere in the PIO decode.
-Greg
Download attachment "signature.asc" of type "application/pgp-signature" (267 bytes)
Powered by blists - more mailing lists