linux-kernel - Re: [PATCH] KVM: vmx: speed up MSR bitmap merge

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:   Tue, 19 Dec 2017 22:13:14 +0100
From:   Paolo Bonzini <pbonzini@...hat.com>
To:     Jim Mattson <jmattson@...gle.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        kvm list <kvm@...r.kernel.org>,
        David Hildenbrand <david@...hat.com>,
        Bandan Das <bsd@...hat.com>
Subject: Re: [PATCH] KVM: vmx: speed up MSR bitmap merge

On 19/12/2017 20:58, Jim Mattson wrote:
>> +               /* Disable read intercept for all MSRs between 0x800 and 0x8ff.  */
> Aren't we actually adopting the read intercepts from VMCS12 and
> *enabling* the *write* intercepts?

Yeah, the comment isn't the best.  What I mean is that L0 doesn't care
about intercepting the reads, so the default all-set msr_bitmap_l0 is
changed as if the read intercept was disabled.  This leaves
msr_bitmap_l1 as the result.

>> +               for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) {
>> +                       unsigned word = msr / BITS_PER_LONG;
>> +                       msr_bitmap_l0[word] = msr_bitmap_l1[word];
>> +                       msr_bitmap_l0[word + (0x800 / sizeof(long))] = ~0;
>
> The indexing above seems a bit obtuse, but maybe it will be clear
> enough after the above comment is fixed up.
> 
>> +               }
>> +       } else {
>> +               for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) {
>> +                       unsigned word = msr / BITS_PER_LONG;
>> +                       msr_bitmap_l0[word] = ~0;
>> +                       msr_bitmap_l0[word + (0x800 / sizeof(long))] = ~0;
>> +               }
>> +       }
>>
>> -       if (nested_cpu_has_virt_x2apic_mode(vmcs12)) {
>> -               if (nested_cpu_has_apic_reg_virt(vmcs12))
>> -                       for (msr = 0x800; msr <= 0x8ff; msr++)
>> -                               nested_vmx_disable_intercept_for_msr(
>> -                                       msr_bitmap_l1, msr_bitmap_l0,
>> -                                       msr, MSR_TYPE_R);
>> +       nested_vmx_disable_intercept_for_msr(
>> +               msr_bitmap_l1, msr_bitmap_l0,
>> +               APIC_BASE_MSR + (APIC_TASKPRI >> 4),
> Perhaps you could #define X2APIC_MSR(reg) (APIC_BASE_MSR + ((reg) >>
> 4)) somewhere appropriate (e.g. arch/x86/include/asm/apicdef.h) and
> use that here (and below) for brevity?

Good idea.

> Should we also think about letting L1 control pass-through of some of
> the more mundane MSRs, like FS_BASE, GS_BASE, and KERNEL_GS_BASE?

Rhetorical question? :)  Yes, it probably gives a small performance
improvement on real-world multi-process (or even just multi-threaded)
workloads.  It's a separate thing to do though.

The next obvious step, to me, is to split prepare_vmcs02 in two parts.
Most of the work can be skipped in the common case where we didn't get
any vmread/vmwrite exit and all guest accesses between VMRESUMEs go
through the shadow VMCS.  This gives both an easy place to draw the line
between the two parts, and an easy way to detect the need to go down
slow path.  Any takers? :)

Paolo