linux-kernel - Re: [PATCH] kvm: optimize ISR lookups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.2.02.1205232210440.3231@ionos>
Date:	Thu, 24 May 2012 00:00:30 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	"H. Peter Anvin" <hpa@...or.com>
cc:	Avi Kivity <avi@...hat.com>, "Michael S. Tsirkin" <mst@...hat.com>,
	kvm@...r.kernel.org, Marcelo Tosatti <mtosatti@...hat.com>,
	Ingo Molnar <mingo@...hat.com>, x86@...nel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] kvm: optimize ISR lookups

On Wed, 23 May 2012, H. Peter Anvin wrote:

> On 05/23/2012 11:37 AM, Thomas Gleixner wrote:
> >>
> >> That works, but replaces one problem with another: now we have two
> >> sources for the same data, and need to juggle between them depending on
> >> register number (either synchronizing in both directions, or special
> >> casing); so you're simplifying one thing at the expense of the other.
> >> If the microcode starts accessing more registers, then having two
> >> layouts becomes even uglier.
> > 
> > Fair enough :)
> 
> Yes, the µcode accessing this data structure directly probably falls
> under the category of a legitimate need to stick to the hardware format.

Thought more about that.

We have a clear distinction between HW accessed data and software
accessed data.

If I look at TPR then it is special cased already and it does:

   case APIC_TASKPRI:
                report_tpr_access(apic, false|true);
                /* fall thru */

And the fall through is using the general accessor for all not special
cased registers.

So all you have to do is 

   case APIC_TASKPRI:
                report_tpr_access(apic, false|true);
+		return access_mapped_reg(...);

Instead of the fall through.

So there is no synchronizing back and forth problem simply because you
already have a special case for that register.

I know you'll argue that the tpr reporting is a special hack for
windows guests, at least that's what the changelog tells.

But even if we have a few more registers accessed by hardware and if
they do not require a special casing, I really doubt that the overhead
of special casing those few regs will be worse than not having the
obvious optimization in place.

And looking deeper it's a total non issue. The apic mapping is 4k. The
register stride is strictly 0x10. That makes a total of 256 possible
registers.

So now you have two possibilites:

1) Create a 256 bit == 64byte bitfield to select the one or the other
   representation.

   The overhead of checking the bit is not going to be observable.

2) Create a 256 function pointer array == 2k resp. 1k (64 / 32bit)

   That's not a lot of memory even if you have to maintain two
   separate variants for read and write, but it allows you to get rid
   of the already horribly compiled switch case in apic_read/write and
   you'll get the optional stuff like report_tpr_access() w/o extra
   conditionals just for free.

   An extra goodie is that you can catch any access to a non existing
   register which you now just silently ignore.  And that allows you
   to react on any future hardware oddities without adding a single
   runtime conditional.

   This is stricly x86 and x86 is way better at dealing with indirect
   calls than with the mess gcc creates compiling those switch case
   constructs.

   So I'd go for that and rather spend the memory and the time in
   setting up the function pointers on init/ioctl than dealing with
   the inconsistency of HW/SW representation with magic hacks.

Thanks,

	tglx