linux-kernel - Re: VM boot failure on nodes not having DMA32 zone

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAKhg4tLjqwR3=jdaKz1r3fEiOAu3L4yih-eNdUEF2R-A0MBxSQ@mail.gmail.com>
Date:   Wed, 25 Jul 2018 16:37:10 +0800
From:   Liang C <liangchen.linux@...il.com>
To:     Paolo Bonzini <pbonzini@...hat.com>
Cc:     rkrcmar@...hat.com, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org
Subject: Re: VM boot failure on nodes not having DMA32 zone

Thank you very much for the quick reply and confirmation!  I just made
and submitted a patch according to your advice.

Thanks,
Liang

On Wed, Jul 25, 2018 at 2:05 AM, Paolo Bonzini <pbonzini@...hat.com> wrote:
> On 24/07/2018 09:53, Liang C wrote:
>> Hi,
>>
>> We have a situation where our qemu processes need to be launched under
>> cgroup cpuset.mems control. This introduces an similar issue that was
>> discussed a few years ago. The difference here is that for our case,
>> not being able to allocate from DMA32 zone is a result a cgroup
>> restriction not mempolicy enforcement. Here is the steps to reproduce
>> the failure,
>>
>> mkdir /sys/fs/cgroup/cpuset/nodeX (where X is a node not having DMA32 zone)
>> echo X > /sys/fs/cgroup/cpuset/nodeX/cpuset.mems
>> echo X > /sys/fs/cgroup/cpuset/nodeX/cpuset.cpus
>> echo 1 > /sys/fs/cgroup/cpuset/node0/cpuset.mem_hardwall
>> echo $$ > /sys/fs/cgroup/cpuset/nodeX/tasks
>>
>> #launch a virtual machine
>> kvm_init_vcpu failed: Cannot allocate memory
>>
>> There are workarounds, like always putting qemu processes onto the
>> node with DMA32 zone or not restricting qemu processes memory
>> allocation until that DMA32 alloc finishes (difficult to be precise).
>> But we would like to find a way to address the root cause.
>>
>> Considering the fact that the pae_root shadow should not be needed
>> when ept is in use, which is indeed our case - ept is always available
>> for us (guessing this is the same case for most of other users), we
>> made a patch roughly like this,
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index d594690..1d1b61e 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -5052,7 +5052,7 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
>>         vcpu->arch.mmu.translate_gpa = translate_gpa;
>>         vcpu->arch.nested_mmu.translate_gpa = translate_nested_gpa;
>>
>> -       return alloc_mmu_pages(vcpu);
>> +       return tdp_enabled ? 0 : alloc_mmu_pages(vcpu);
>>  }
>>
>>  void kvm_mmu_setup(struct kvm_vcpu *vcpu)
>>
>>
>> It works through our test cases. But we would really like to have your
>> insight on this patch before applying it in production environment and
>> contributing it back to the community. Thanks in advance for any help
>> you may provide!
>
> Yes, this looks good.  However, I'd place the "if" in alloc_mmu_pages
> itself.
>
> Thanks,
>
> Paolo