linux-kernel - Re: [PATCH v2 00/11] KVM: Support guest MAXPHYADDR

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0d1acded-93a4-c1fa-b8f8-cfca9e082cd1@amd.com>
Date:   Mon, 22 Jun 2020 11:33:38 -0500
From:   Tom Lendacky <thomas.lendacky@....com>
To:     Paolo Bonzini <pbonzini@...hat.com>,
        Mohammed Gamal <mgamal@...hat.com>, kvm@...r.kernel.org
Cc:     linux-kernel@...r.kernel.org, vkuznets@...hat.com,
        sean.j.christopherson@...el.com, wanpengli@...cent.com,
        jmattson@...gle.com, joro@...tes.org, babu.moger@....com
Subject: Re: [PATCH v2 00/11] KVM: Support guest MAXPHYADDR < host MAXPHYADDR

On 6/19/20 6:07 PM, Paolo Bonzini wrote:
> On 19/06/20 23:52, Tom Lendacky wrote:
>>> A more subtle issue is when the host MAXPHYADDR is larger than that
>>> of the guest. Page faults caused by reserved bits on the guest won't
>>> cause an EPT violation/NPF and hence we also check guest MAXPHYADDR
>>> and add PFERR_RSVD_MASK error code to the page fault if needed.
>>
>> I'm probably missing something here, but I'm confused by this
>> statement. Is this for a case where a page has been marked not
>> present and the guest has also set what it believes are reserved
>> bits? Then when the page is accessed, the guest sees a page fault
>> without the error code for reserved bits?
> 
> No, for non-present page there is no issue because there are no reserved
> bits in that case.  If the page is present and no reserved bits are set
> according to the host, however, there are two cases to consider:
> 
> - if the page is not accessible to the guest according to the
> permissions in the page table, it will cause a #PF.  We need to trap it
> and change the error code into P|RSVD if the guest physical address has
> any guest-reserved bits.

I'm not a big fan of trapping #PF for this. Can't this have a performance
impact on the guest? If I'm not mistaken, Qemu will default to TCG
physical address size (40-bits), unless told otherwise, causing #PF to now
be trapped. Maybe libvirt defaults to matching host/guest CPU MAXPHYADDR?

In bare-metal, there's no guarantee a CPU will report all the faults in a
single PF error code. And because of race conditions, software can never
rely on that behavior. Whenever the OS thinks it has cured an error, it
must always be able to handle another #PF for the same access when it
retries because another processor could have modified the PTE in the
meantime. What's the purpose of reporting RSVD in the error code in the
guest in regards to live migration?

> 
> - if the page is accessible to the guest according to the permissions in
> the page table, it will cause a #NPF.  Again, we need to trap it, check
> the guest physical address and inject a P|RSVD #PF if the guest physical
> address has any guest-reserved bits.
> 
> The AMD specific issue happens in the second case.  By the time the NPF
> vmexit occurs, the accessed and/or dirty bits have been set and this
> should not have happened before the RSVD page fault that we want to
> inject.  On Intel processors, instead, EPT violations trigger before
> accessed and dirty bits are set.  I cannot find an explicit mention of
> the intended behavior in either the
> Intel SDM or the AMD APM.

Section 15.25.6 of the AMD APM volume 2 talks about page faults (nested vs
guest) and fault ordering. It does talk about setting guest A/D bits
during the walk, before an #NPF is taken. I don't see any way around that
given a virtual MAXPHYADDR in the guest being less than the host MAXPHYADDR.

Thanks,
Tom

> 
> Paolo
>