linux-kernel - Re: [PATCH] kvm: optimize ISR lookups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120521230623.GO17031@redhat.com>
Date:	Tue, 22 May 2012 02:06:24 +0300
From:	"Michael S. Tsirkin" <mst@...hat.com>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	kvm@...r.kernel.org, Avi Kivity <avi@...hat.com>,
	Marcelo Tosatti <mtosatti@...hat.com>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] kvm: optimize ISR lookups

On Mon, May 21, 2012 at 11:04:25PM +0200, Thomas Gleixner wrote:
> On Mon, 21 May 2012, Michael S. Tsirkin wrote:
> 
> > We perform ISR lookups twice: during interrupt
> > injection and on EOI. Typical workloads only have
> > a single bit set there. So we can avoid ISR scans by
> > 1. counting bits as we set/clear them in ISR
> > 2. if count is 1, caching the vector number
> > 3. if count != 1, invalidating the cache
> > 
> > The real purpose of this is enabling PV EOI
> > which needs to quickly validate the vector.
> > But non PV guests also benefit.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@...hat.com>
> > ---
> >  arch/x86/kvm/lapic.c |   51 ++++++++++++++++++++++++++++++++++++++++++++++++-
> >  arch/x86/kvm/lapic.h |    2 +
> >  2 files changed, 51 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 93c1574..232950a 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -107,6 +107,16 @@ static inline void apic_clear_vector(int vec, void *bitmap)
> >  	clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
> >  }
> >  
> > +static inline int __apic_test_and_set_vector(int vec, void *bitmap)
> > +{
> > +	return __test_and_set_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
> > +}
> > +
> > +static inline int __apic_test_and_clear_vector(int vec, void *bitmap)
> > +{
> > +	return __test_and_clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
> > +}
> > +
> >  static inline int apic_hw_enabled(struct kvm_lapic *apic)
> >  {
> >  	return (apic)->vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE;
> > @@ -210,6 +220,16 @@ static int find_highest_vector(void *bitmap)
> >  		return fls(word[word_offset << 2]) - 1 + (word_offset << 5);
> >  }
> >  
> > +static u8 count_vectors(void *bitmap)
> > +{
> > +	u32 *word = bitmap;
> > +	int word_offset;
> > +	u8 count = 0;
> > +	for (word_offset = 0; word_offset < MAX_APIC_VECTOR >> 5; ++word_offset)
> > +		count += hweight32(word[word_offset << 2]);
> > +	return count;
> > +}
> 
> Why do you need to reimplement bitmap_weight()?
> 
> Because your bitmap is not a bitmap,

It's an existing "bitmap". It's not my bitmap :)

> but a set of 32bit registers
> which have an offset of 4 words each. I really had to twist my brain
> around this function and the test_and_clr/set ones above just because
> the name bitmap is so horribly misleading. Add an extra bonus for
> using void pointers.
> 
> Yes, I know, the existing code is full of this, but that's not an
> excuse to add more of it.

There's no other way to use existing code. If current code
is reworked to use bitmap.h then my patch can use it too.


> This emulation is just silly. Nothing in an emulation where the access
> to the emulated hardware is implemented with vmexits is requiring a
> 1:1 memory layout. It's completely irrelevant whether the hardware has
> an ISR regs offset of 0x10 or not. Nothing prevents the emulation to
> use a consecutive bitmap for the vector bits instead of reimplementing
> a boatload of bitmap operations.
> 
> The only justification for having the same layout as the actual
> hardware is when you are going to map the memory into the guest space,
> which is not the case here.

I think the reason is __apic_read which now simply copies the registers
out to guest, this code will become less straight-forward if it's not
1:1.

> And if you look deeper, then you'll notice that _ALL_ APIC registers
> are on a 16 byte boundary (thanks Peter for pointing that out).
> 
> So it's even more silly to have a 1:1 representation instead of
> implementing the default emulated apic_read/write functions to access
> (offset >> 2).
> 
> And of course, that design decision causes lookups to be slow.

Yes, it might be one of the reasons why my patch helps so
much: it adds a cache in front of this data structure.

> It's
> way faster to scan a consecutive bitmap than iterating through eight
> 32bit words with an offset of 0x10, especially on a 64 bit host.
>
> >  static inline int apic_test_and_set_irr(int vec, struct kvm_lapic *apic)
> >  {
> >  	apic->irr_pending = true;
> > @@ -242,6 +262,25 @@ static inline void apic_clear_irr(int vec, struct kvm_lapic *apic)
> >  		apic->irr_pending = true;
> >  }
> >  
> > +static inline void apic_set_isr(int vec, struct kvm_lapic *apic)
> > +{
> > +	if (!__apic_test_and_set_vector(vec, apic->regs + APIC_ISR))
> > +		++apic->isr_count;
> > +	ASSERT(apic->isr_count > MAX_APIC_VECTOR);
> 
> I'm really curious what you observed when defining DEBUG in that file.
> 
> Clearly you never did.
> 
> > +	if (likely(apic->isr_count == 1))
> > +		apic->isr_cache = vec;
> > +	else
> > +		apic->isr_cache = -1;
> > +}
> > +
> > +static inline void apic_clear_isr(int vec, struct kvm_lapic *apic)
> > +{
> > +	if (__apic_test_and_clear_vector(vec, apic->regs + APIC_ISR))
> > +		--apic->isr_count;
> > +	ASSERT(apic->isr_count < 0);
> 
> Ditto.
> 
> >  	result = find_highest_vector(apic->regs + APIC_ISR);
> >  	ASSERT(result == -1 || result >= 16);
> 
> And obviously none of the folks who added this gem bothered to define
> DEBUG either.
> 
> So please instead of working around horrid design decisions and adding
> more mess to the existing one, can we please cleanup the stuff first
> and then decide whether it's worth to add the extra magic?
> 
> Thanks,
> 
> 	tglx


So what you propose is in fact to rework apic registers at least for
ISR,IRR,TMR to use a bitmap.
I am fine with this suggestion but would like some feedback from kvm
maintainers on where they want to go before I spend time on that.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/