lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a47d4ba6-dada-72de-e6be-fb0e50324aaf@redhat.com>
Date:   Wed, 25 Aug 2021 14:40:46 +0200
From:   Emanuele Giuseppe Esposito <eesposit@...hat.com>
To:     Sean Christopherson <seanjc@...gle.com>,
        Maxim Levitsky <mlevitsk@...hat.com>
Cc:     kvm@...r.kernel.org, Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>, Ingo Molnar <mingo@...hat.com>,
        Borislav Petkov <bp@...en8.de>, x86@...nel.org,
        "H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] KVM: nSVM: temporarly save vmcb12's efer, cr0 and cr4
 to avoid TOC/TOU races

Hi Sean,

(Spoiler alert: I am new on all this stuff, so I would like to have some 
clarifications about your suggestion. Thank you in advance)

On 12/08/2021 01:25, Sean Christopherson wrote:
> On Wed, Aug 11, 2021, Maxim Levitsky wrote:
>> On Mon, 2021-08-09 at 16:53 +0200, Emanuele Giuseppe Esposito wrote:
>>> @@ -1336,7 +1335,8 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
>>>   	if (!(save->cr0 & X86_CR0_PG) ||
>>>   	    !(save->cr0 & X86_CR0_PE) ||
>>>   	    (save->rflags & X86_EFLAGS_VM) ||
>>> -	    !nested_vmcb_valid_sregs(vcpu, save))
>>> +	    !nested_vmcb_valid_sregs(vcpu, save, save->efer, save->cr0,
>>> +				     save->cr4))
>>>   		goto out_free;
>>>   
>>>   	/*
>> The disadvantage of my approach is that fields are copied twice, once from
>> vmcb12 to its local copy, and then from the local copy to vmcb02, however
>> this approach is generic in such a way that TOC/TOI races become impossible.
>>
>> The disadvantage of your approach is that only some fields are copied and
>> there is still a chance of TOC/TOI race in the future.
> 
> The partial copy makes me nervous too.  I also don't like pulling out select
> registers and passing them by value; IMO the resulting code is harder to follow
> and will be more difficult to maintain, e.g. it won't scale if the list of regs
> to check grows.
> 
> But I don't think we need to copy _everything_.   There's also an opportunity to
> clean up svm_set_nested_state(), though the ABI ramifications may be problematic.
> 
> Instead of passing vmcb_control_area and vmcb_save_area to nested_vmcb_valid_sregs()
> and nested_vmcb_valid_sregs(), pass svm_nested_state and force the helpers to extract
> the save/control fields from the nested state.  If a new check is added to KVM, it
> will be obvious (and hopefully fail) if the state being check is not copied from vmcb12.

I think I understood what you mean here, so basically you propose of 
having svm->nested.save and its helpers similar to copy_vmcb_save_area,
but for now just copy the fields that we want to protect, ie only efer, 
cr0, cr4 and maybe also cr3 (to be consistent with VMCB_CR of clean 
bits). Then pass svm->nested.save instead of vmcb12->save to 
nested_vmcb_valid_sregs() and use it also for 
nested_vmcb02_prepare_save(), to avoid TOC/TOU issues.
At least that's how I understood it.

> 
> Regarding svm_set_nested_state(), if we can clobber svm->nested.ctl and svm->nested.save
> (doesn't exist currently) on a failed ioctl(), then the temporary allocations for those
> can be replaced with using svm->nested as the buffer.

I am not sure what you mean with failed ioctl(), but I guess the meaning 
here is to replace the kzalloc'ed ctl and save variables with these two 
states (nested.save and nested.ctl).

> 
> And to mitigate the cost of copying to a kernel-controlled cache, we should use
> the VMCB Clean bits as they're intended.
> 
>    Each set bit in the VMCB Clean field allows the processor to load one guest
>    register or group of registers from the hardware cache;
> 
> E.g. copy from vmcb12 iff the clean bit is clear.  Then we could further optimize
> nested VMRUN to skip checks based on clean bits.
> 
I looked up the clean fields, so my understanding is that if we do set 
EFER/CR0/CR4 in nested_vmcb02_prepare_save() with nested.save, we don't 
need to check the clean bits because

"The guest's execution can cause cached state to be updated, but the 
hypervisor is not responsible for setting VMCB Clean bits corresponding 
to any state changes caused by guest execution."

and setting the VMCB_CR after copying the vmcb12 save fields into the 
nested state. But I don't think this is what you mean here, especially 
when saying
> copy from vmcb12 iff the clean bit is clear

Thank you,
Emanuele

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ