[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAM3pwhF9YSe9Vtkb=BL29iEExZf1DUep6K0rytpBVG6648UZ6g@mail.gmail.com>
Date: Tue, 12 Sep 2017 13:06:39 -0700
From: Peter Feiner <pfeiner@...gle.com>
To: Paolo Bonzini <pbonzini@...hat.com>
Cc: Jim Mattson <jmattson@...gle.com>,
LKML <linux-kernel@...r.kernel.org>,
kvm list <kvm@...r.kernel.org>,
David Hildenbrand <david@...hat.com>
Subject: Re: [PATCH] KVM: MMU: speedup update_permission_bitmask
On Tue, Sep 12, 2017 at 12:55 PM, Paolo Bonzini <pbonzini@...hat.com> wrote:
> On 12/09/2017 18:48, Peter Feiner wrote:
>>>>
>>>> Because update_permission_bitmask is actually the top item in the profile
>>>> for nested vmexits, this speeds up an L2->L1 vmexit by about ten thousand
>>>> clock cycles, or up to 30%:
>>
>> This is a great improvement! Why not take it a step further and
>> compute the whole table once at module init time and be done with it?
>> There are only 5 extra input bits (nx, ept, smep, smap, wp),
>
> 4 actually, nx could be ignored (because unlike WP, the bit is reserved
> when nx is disabled). It is only handled for clarity.
>
>> so the
>> whole table would only take up (1 << 5) * 16 = 512 bytes. Moreover, if
>> you had 32 VMs on the host, you'd actually save memory!
>
> Indeed; my thought was to write a script or something to generate the
> tables at compile time, but doing it at module init time would be clever
> and easier.
>
> That said, the generated code for the function, right now, is pretty
> good. If it saved 1000 clock cycles per nested vmexit it would be very
> convincing, but if it were 50 or even 100 a bit less so.
ACK. I'm good with either approach :-) Please consider this one
Reviewed-By: Peter Feiner <pfeiner@...gle.com>
Powered by blists - more mailing lists