[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1fb901ef-7c42-7753-fe78-0251ca4715d3@redhat.com>
Date: Thu, 11 Feb 2021 14:15:47 +0100
From: David Hildenbrand <david@...hat.com>
To: Claudio Imbrenda <imbrenda@...ux.ibm.com>
Cc: linux-kernel@...r.kernel.org, borntraeger@...ibm.com,
frankja@...ux.ibm.com, cohuck@...hat.com, kvm@...r.kernel.org,
linux-s390@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH v3 1/2] s390/kvm: extend kvm_s390_shadow_fault to return
entry pointer
On 11.02.21 13:57, Claudio Imbrenda wrote:
> On Thu, 11 Feb 2021 10:18:56 +0100
> David Hildenbrand <david@...hat.com> wrote:
>
>> On 09.02.21 16:43, Claudio Imbrenda wrote:
>>> Extend kvm_s390_shadow_fault to return the pointer to the valid leaf
>>> DAT table entry, or to the invalid entry.
>>>
>>> Also return some flags in the lower bits of the address:
>>> DAT_PROT: indicates that DAT protection applies because of the
>>> protection bit in the segment (or, if EDAT, region)
>>> tables NOT_PTE: indicates that the address of the DAT table entry
>>> returned does not refer to a PTE, but to a segment or region table.
>>>
>>
>> I've been thinking about one possible issue, but I think it's not
>> actually an issue. Just sharing so others can verify:
>>
>> In case our guest uses huge pages / gigantic pages / ASCE R, we
>> create fake page tables (GMAP_SHADOW_FAKE_TABLE).
>>
>> So, it could be that kvm_s390_shadow_fault()->gmap_shadow_pgt_lookup()
>> succeeds, however, we have a fake PTE in our hands. We lost the
>> actual guest STE/RTE address. (I think it could be recovered somehow
>> via page->index, thought)
>>
>> But I guess, if there is a fake PTE, then there is not acutally
>> something that could go wrong in gmap_shadow_page() anymore that could
>> lead us in responding something wrong to the guest. We can only really
>> fail with -EINVAL, -ENOMEM or -EFAULT.
>
> this was also my reasoning
>
>> So if the guest changed anything in the meantime (e.g., zap a
>> segment), we would have unshadowed the whole fake page table
>> hierarchy and would simply retry.
>>
>>> Signed-off-by: Claudio Imbrenda <imbrenda@...ux.ibm.com>
>>> Cc: stable@...r.kernel.org
>>> ---
>>> arch/s390/kvm/gaccess.c | 30 +++++++++++++++++++++++++-----
>>> arch/s390/kvm/gaccess.h | 5 ++++-
>>> arch/s390/kvm/vsie.c | 8 ++++----
>>> 3 files changed, 33 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
>>> index 6d6b57059493..e0ab83f051d2 100644
>>> --- a/arch/s390/kvm/gaccess.c
>>> +++ b/arch/s390/kvm/gaccess.c
>>> @@ -976,7 +976,9 @@ int kvm_s390_check_low_addr_prot_real(struct
>>> kvm_vcpu *vcpu, unsigned long gra)
>>> * kvm_s390_shadow_tables - walk the guest page table and create
>>> shadow tables
>>> * @sg: pointer to the shadow guest address space structure
>>> * @saddr: faulting address in the shadow gmap
>>> - * @pgt: pointer to the page table address result
>>> + * @pgt: pointer to the beginning of the page table for the given
>>> address if
>>> + * successful (return value 0), or to the first invalid DAT
>>> entry in
>>> + * case of exceptions (return value > 0)
>>> * @fake: pgt references contiguous guest memory block, not a
>>> pgtable */
>>> static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long
>>> saddr, @@ -1034,6 +1036,7 @@ static int
>>> kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
>>> rfte.val = ptr; goto shadow_r2t;
>>> }
>>> + *pgt = ptr + vaddr.rfx * 8;
>>> rc = gmap_read_table(parent, ptr + vaddr.rfx * 8,
>>> &rfte.val);
>>
>> Using
>>
>> gmap_read_table(parent, *pgt, &rfte.val);
>>
>> or similar with a local variable might then be even clearer. But no
>> strong opinion.
>
> that's also something I had thought about, in the end this minimizes
> the number of lines / variables while still being readable
>
>>> if (rc)
>>> return rc;
>>> @@ -1060,6 +1063,7 @@ static int kvm_s390_shadow_tables(struct gmap
>>> *sg, unsigned long saddr, rste.val = ptr;
>>> goto shadow_r3t;
>>> }
>>> + *pgt = ptr + vaddr.rsx * 8;
>>> rc = gmap_read_table(parent, ptr + vaddr.rsx * 8,
>>> &rste.val); if (rc)
>>> return rc;
>>> @@ -1087,6 +1091,7 @@ static int kvm_s390_shadow_tables(struct gmap
>>> *sg, unsigned long saddr, rtte.val = ptr;
>>> goto shadow_sgt;
>>> }
>>> + *pgt = ptr + vaddr.rtx * 8;
>>> rc = gmap_read_table(parent, ptr + vaddr.rtx * 8,
>>> &rtte.val); if (rc)
>>> return rc;
>>> @@ -1123,6 +1128,7 @@ static int kvm_s390_shadow_tables(struct gmap
>>> *sg, unsigned long saddr, ste.val = ptr;
>>> goto shadow_pgt;
>>> }
>>> + *pgt = ptr + vaddr.sx * 8;
>>> rc = gmap_read_table(parent, ptr + vaddr.sx * 8,
>>> &ste.val); if (rc)
>>> return rc;
>>> @@ -1157,6 +1163,8 @@ static int kvm_s390_shadow_tables(struct gmap
>>> *sg, unsigned long saddr,
>>> * @vcpu: virtual cpu
>>> * @sg: pointer to the shadow guest address space structure
>>> * @saddr: faulting address in the shadow gmap
>>> + * @datptr: will contain the address of the faulting DAT table
>>> entry, or of
>>> + * the valid leaf, plus some flags
>>> *
>>> * Returns: - 0 if the shadow fault was successfully resolved
>>> * - > 0 (pgm exception code) on exceptions while
>>> faulting @@ -1165,11 +1173,11 @@ static int
>>> kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
>>> * - -ENOMEM if out of memory
>>> */
>>> int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
>>> - unsigned long saddr)
>>> + unsigned long saddr, unsigned long
>>> *datptr) {
>>> union vaddress vaddr;
>>> union page_table_entry pte;
>>> - unsigned long pgt;
>>> + unsigned long pgt = 0;
>>> int dat_protection, fake;
>>> int rc;
>>>
>>> @@ -1191,8 +1199,20 @@ int kvm_s390_shadow_fault(struct kvm_vcpu
>>> *vcpu, struct gmap *sg, pte.val = pgt + vaddr.px * PAGE_SIZE;
>>> goto shadow_page;
>>> }
>>> - if (!rc)
>>> - rc = gmap_read_table(sg->parent, pgt + vaddr.px *
>>> 8, &pte.val); +
>>> + switch (rc) {
>>> + case PGM_SEGMENT_TRANSLATION:
>>> + case PGM_REGION_THIRD_TRANS:
>>> + case PGM_REGION_SECOND_TRANS:
>>> + case PGM_REGION_FIRST_TRANS:
>>> + pgt |= NOT_PTE;
>>> + break;
>>> + case 0:
>>> + pgt += vaddr.px * 8;
>>> + rc = gmap_read_table(sg->parent, pgt, &pte.val);
>>> + }
>>> + if (*datptr)
>>> + *datptr = pgt | dat_protection * DAT_PROT;
>>> if (!rc && pte.i)
>>> rc = PGM_PAGE_TRANSLATION;
>>> if (!rc && pte.z)
>>> diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h
>>> index f4c51756c462..fec26bbb17ba 100644
>>> --- a/arch/s390/kvm/gaccess.h
>>> +++ b/arch/s390/kvm/gaccess.h
>>> @@ -359,7 +359,10 @@ void ipte_unlock(struct kvm_vcpu *vcpu);
>>> int ipte_lock_held(struct kvm_vcpu *vcpu);
>>> int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu,
>>> unsigned long gra);
>>> +#define DAT_PROT 2
>>> +#define NOT_PTE 4
>>
>> What if our guest is using ASCE.R ?
>
> then we don't care.
>
> if the guest is using ASCE.R, then shadowing will always succeed, and
> the VSIE MVPG handler will retry right away.
>
> if you are worried about the the lowest order bit, it can only be set
> if a specific feature is enabled in the host, and KVM doesn't use /
> support it, so the guest can't use it for its guest.
Got it, thanks! :)
Reviewed-by: David Hildenbrand <david@...hat.com>
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists