lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 21 May 2012 23:04:25 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	"Michael S. Tsirkin" <mst@...hat.com>
cc:	kvm@...r.kernel.org, Avi Kivity <avi@...hat.com>,
	Marcelo Tosatti <mtosatti@...hat.com>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] kvm: optimize ISR lookups

On Mon, 21 May 2012, Michael S. Tsirkin wrote:

> We perform ISR lookups twice: during interrupt
> injection and on EOI. Typical workloads only have
> a single bit set there. So we can avoid ISR scans by
> 1. counting bits as we set/clear them in ISR
> 2. if count is 1, caching the vector number
> 3. if count != 1, invalidating the cache
> 
> The real purpose of this is enabling PV EOI
> which needs to quickly validate the vector.
> But non PV guests also benefit.
> 
> Signed-off-by: Michael S. Tsirkin <mst@...hat.com>
> ---
>  arch/x86/kvm/lapic.c |   51 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  arch/x86/kvm/lapic.h |    2 +
>  2 files changed, 51 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 93c1574..232950a 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -107,6 +107,16 @@ static inline void apic_clear_vector(int vec, void *bitmap)
>  	clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
>  }
>  
> +static inline int __apic_test_and_set_vector(int vec, void *bitmap)
> +{
> +	return __test_and_set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
> +}
> +
> +static inline int __apic_test_and_clear_vector(int vec, void *bitmap)
> +{
> +	return __test_and_clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
> +}
> +
>  static inline int apic_hw_enabled(struct kvm_lapic *apic)
>  {
>  	return (apic)->vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE;
> @@ -210,6 +220,16 @@ static int find_highest_vector(void *bitmap)
>  		return fls(word[word_offset << 2]) - 1 + (word_offset << 5);
>  }
>  
> +static u8 count_vectors(void *bitmap)
> +{
> +	u32 *word = bitmap;
> +	int word_offset;
> +	u8 count = 0;
> +	for (word_offset = 0; word_offset < MAX_APIC_VECTOR >> 5; ++word_offset)
> +		count += hweight32(word[word_offset << 2]);
> +	return count;
> +}

Why do you need to reimplement bitmap_weight()?

Because your bitmap is not a bitmap, but a set of 32bit registers
which have an offset of 4 words each. I really had to twist my brain
around this function and the test_and_clr/set ones above just because
the name bitmap is so horribly misleading. Add an extra bonus for
using void pointers.

Yes, I know, the existing code is full of this, but that's not an
excuse to add more of it.

This emulation is just silly. Nothing in an emulation where the access
to the emulated hardware is implemented with vmexits is requiring a
1:1 memory layout. It's completely irrelevant whether the hardware has
an ISR regs offset of 0x10 or not. Nothing prevents the emulation to
use a consecutive bitmap for the vector bits instead of reimplementing
a boatload of bitmap operations.

The only justification for having the same layout as the actual
hardware is when you are going to map the memory into the guest space,
which is not the case here.

And if you look deeper, then you'll notice that _ALL_ APIC registers
are on a 16 byte boundary (thanks Peter for pointing that out).

So it's even more silly to have a 1:1 representation instead of
implementing the default emulated apic_read/write functions to access
(offset >> 2).

And of course, that design decision causes lookups to be slow. It's
way faster to scan a consecutive bitmap than iterating through eight
32bit words with an offset of 0x10, especially on a 64 bit host.

>  static inline int apic_test_and_set_irr(int vec, struct kvm_lapic *apic)
>  {
>  	apic->irr_pending = true;
> @@ -242,6 +262,25 @@ static inline void apic_clear_irr(int vec, struct kvm_lapic *apic)
>  		apic->irr_pending = true;
>  }
>  
> +static inline void apic_set_isr(int vec, struct kvm_lapic *apic)
> +{
> +	if (!__apic_test_and_set_vector(vec, apic->regs + APIC_ISR))
> +		++apic->isr_count;
> +	ASSERT(apic->isr_count > MAX_APIC_VECTOR);

I'm really curious what you observed when defining DEBUG in that file.

Clearly you never did.

> +	if (likely(apic->isr_count == 1))
> +		apic->isr_cache = vec;
> +	else
> +		apic->isr_cache = -1;
> +}
> +
> +static inline void apic_clear_isr(int vec, struct kvm_lapic *apic)
> +{
> +	if (__apic_test_and_clear_vector(vec, apic->regs + APIC_ISR))
> +		--apic->isr_count;
> +	ASSERT(apic->isr_count < 0);

Ditto.

>  	result = find_highest_vector(apic->regs + APIC_ISR);
>  	ASSERT(result == -1 || result >= 16);

And obviously none of the folks who added this gem bothered to define
DEBUG either.

So please instead of working around horrid design decisions and adding
more mess to the existing one, can we please cleanup the stuff first
and then decide whether it's worth to add the extra magic?

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ