lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 13 Dec 2011 12:58:19 +0200
From:	Avi Kivity <avi@...hat.com>
To:	Thomas Gleixner <tglx@...utronix.de>
CC:	Ingo Molnar <mingo@...e.hu>, Pekka Enberg <penberg@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Sasha Levin <levinsasha928@...il.com>
Subject: Re: [patch 0/3] kvm tool: Serial emulation overhaul

On 12/13/2011 02:59 AM, Thomas Gleixner wrote:

<snip trace>
> Why the heck is a paravirtualized guest using an local APIC timer
> emulation, instead of a paravirtualized clock event device?
>
> Just look at the trace. That's insane. We enter the guest for 2us to
> come back and handle the APIC_EOI for 11us. Then we go back to the
> guest for 9us and spend again 11us for handling a write to APIC_TMICT.
>
> That's 11us guest vs. 22us host time.

Run your guest with x2apic enabled, the timing will be very different. 
You'll still have an exit for APIC_TMICT and APIC_EOI, but they'll be
much faster.  It's possible to avoid the EOI exit with some paravirt
magic, but that has its own issues.

> Aside of that, when looking at the bootup, the guest "calibrates" the
> local APIC timer emulation against an emulated legacy device to figure
> out the APIC timer clock rate, which is totally irrelevant for a
> paravirtualized guest, if done right.
>
> Look how a guest timer is programmed:
>
>      hrtimer_start();
>         ...
> 	clock_events_programm_event(dev, expires, now);
> 	  ns_delta = expires - now;
> 	  delta = convert_ns_to_dev(ns_delta, dev);
> 	  dev->set_next_event(delta, dev);
> 	    lapic_next_event(delta, dev);
> 	      apic_write(APIC_TMICT, delta);
> 	        |
> 		---> traps into host
>                   kvm_mmu_pagetable_walk();
>                   kvm_mmio_emulation();
>                     kvm_apic_emulation();
> 		      start_apic_timer();
> 		        now = get_host_time();
> 		        delta = convert_apic_to_ns(APIC_TMICT);
>                         hrtimer_start(apic_timer, now + delta, HRTIMER_MODE_ABS);
>
> Oh well, we 
>
>    - convert from nsec to a "calibrated" APIC delta
>    - "program" the APIC timer
>    - trap into the host
>    - convert the "calibrated" delta back to nsec
>    - add it to the current host time
>    - arm the timer
>
> Why the heck don't we use a paravirt device, which just provides a
> nsec based interface. The host knows the time delta between the guests
> notion of CLOCK_MONOTONIC and its own.

We do have a paravirt clocksource, just not clockevents.

>  That would reduce the whole
> procedure to:
>
>      hrtimer_start();
>         ...
> 	clock_events_programm_event(dev, expires, now);
> 	  dev->set_next_ktime(expires, dev);
> 	    kvm_clock_event_set_next(expires, dev);
> 	        |
> 		---> traps into host with a paravirt call
> 		kvm_handle_guest_clkev_dev();
>                   hrtimer_start(apic_timer, expires + host_guest_delta, HRTIMER_MODE_ABS);
>
> That would save tons of time on an hot path. Even if the
> host_guest_delta approach does not work, a 1:1 nsec mapping as a
> relative timer on the host would be way faster than the current
> solution.
>

The problem with paravirt clockevents is that if/when the APIC becomes
virtualized, then guests which were started with the paravirt
clockevents don't get accelerated when they are migrated onto newer
hardware.  This problem has bitten us several times in the past; if you
want to see how it looks when applied on a large scale look at Xen -
they have a paravirt-the-fsck-out-of-everything mode and a full virt
mode (which should be way faster these days); the two aren't
compatible.  Of course back when they started, they didn't have a
choice, but we do.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ