lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <036fd396-9b5f-a565-7aed-eec6ac5d5133@redhat.com>
Date:   Tue, 24 Jul 2018 20:05:49 +0200
From:   Paolo Bonzini <pbonzini@...hat.com>
To:     Liang C <liangchen.linux@...il.com>, rkrcmar@...hat.com
Cc:     linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: VM boot failure on nodes not having DMA32 zone

On 24/07/2018 09:53, Liang C wrote:
> Hi,
> 
> We have a situation where our qemu processes need to be launched under
> cgroup cpuset.mems control. This introduces an similar issue that was
> discussed a few years ago. The difference here is that for our case,
> not being able to allocate from DMA32 zone is a result a cgroup
> restriction not mempolicy enforcement. Here is the steps to reproduce
> the failure,
> 
> mkdir /sys/fs/cgroup/cpuset/nodeX (where X is a node not having DMA32 zone)
> echo X > /sys/fs/cgroup/cpuset/nodeX/cpuset.mems
> echo X > /sys/fs/cgroup/cpuset/nodeX/cpuset.cpus
> echo 1 > /sys/fs/cgroup/cpuset/node0/cpuset.mem_hardwall
> echo $$ > /sys/fs/cgroup/cpuset/nodeX/tasks
> 
> #launch a virtual machine
> kvm_init_vcpu failed: Cannot allocate memory
> 
> There are workarounds, like always putting qemu processes onto the
> node with DMA32 zone or not restricting qemu processes memory
> allocation until that DMA32 alloc finishes (difficult to be precise).
> But we would like to find a way to address the root cause.
> 
> Considering the fact that the pae_root shadow should not be needed
> when ept is in use, which is indeed our case - ept is always available
> for us (guessing this is the same case for most of other users), we
> made a patch roughly like this,
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index d594690..1d1b61e 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -5052,7 +5052,7 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
>         vcpu->arch.mmu.translate_gpa = translate_gpa;
>         vcpu->arch.nested_mmu.translate_gpa = translate_nested_gpa;
> 
> -       return alloc_mmu_pages(vcpu);
> +       return tdp_enabled ? 0 : alloc_mmu_pages(vcpu);
>  }
> 
>  void kvm_mmu_setup(struct kvm_vcpu *vcpu)
> 
> 
> It works through our test cases. But we would really like to have your
> insight on this patch before applying it in production environment and
> contributing it back to the community. Thanks in advance for any help
> you may provide!

Yes, this looks good.  However, I'd place the "if" in alloc_mmu_pages
itself.

Thanks,

Paolo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ