linux-kernel - Re: [PATCH] KVM: MMU: speedup update_permission

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAM3pwhF9YSe9Vtkb=BL29iEExZf1DUep6K0rytpBVG6648UZ6g@mail.gmail.com>
Date:   Tue, 12 Sep 2017 13:06:39 -0700
From:   Peter Feiner <pfeiner@...gle.com>
To:     Paolo Bonzini <pbonzini@...hat.com>
Cc:     Jim Mattson <jmattson@...gle.com>,
        LKML <linux-kernel@...r.kernel.org>,
        kvm list <kvm@...r.kernel.org>,
        David Hildenbrand <david@...hat.com>
Subject: Re: [PATCH] KVM: MMU: speedup update_permission_bitmask

On Tue, Sep 12, 2017 at 12:55 PM, Paolo Bonzini <pbonzini@...hat.com> wrote:
> On 12/09/2017 18:48, Peter Feiner wrote:
>>>>
>>>> Because update_permission_bitmask is actually the top item in the profile
>>>> for nested vmexits, this speeds up an L2->L1 vmexit by about ten thousand
>>>> clock cycles, or up to 30%:
>>
>> This is a great improvement! Why not take it a step further and
>> compute the whole table once at module init time and be done with it?
>> There are only 5 extra input bits (nx, ept, smep, smap, wp),
>
> 4 actually, nx could be ignored (because unlike WP, the bit is reserved
> when nx is disabled).  It is only handled for clarity.
>
>> so the
>> whole table would only take up (1 << 5) * 16 = 512 bytes. Moreover, if
>> you had 32 VMs on the host, you'd actually save memory!
>
> Indeed; my thought was to write a script or something to generate the
> tables at compile time, but doing it at module init time would be clever
> and easier.
>
> That said, the generated code for the function, right now, is pretty
> good.  If it saved 1000 clock cycles per nested vmexit it would be very
> convincing, but if it were 50 or even 100 a bit less so.

ACK. I'm good with either approach :-) Please consider this one

Reviewed-By: Peter Feiner <pfeiner@...gle.com>