linux-kernel - Re: [PATCH v2 20/28] KVM: x86/mmu: Use atomic ops to set SPTEs in TDP MMU map

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANgfPd85U_YwDdXc1Dkn-UzKpae5FRzYshLFABAU_xHTs0i3Hg@mail.gmail.com>
Date:   Thu, 1 Apr 2021 09:50:15 -0700
From:   Ben Gardon <bgardon@...gle.com>
To:     Paolo Bonzini <pbonzini@...hat.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, kvm <kvm@...r.kernel.org>,
        Peter Xu <peterx@...hat.com>,
        Sean Christopherson <seanjc@...gle.com>,
        Peter Shier <pshier@...gle.com>,
        Peter Feiner <pfeiner@...gle.com>,
        Junaid Shahid <junaids@...gle.com>,
        Jim Mattson <jmattson@...gle.com>,
        Yulei Zhang <yulei.kernel@...il.com>,
        Wanpeng Li <kernellwp@...il.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Xiao Guangrong <xiaoguangrong.eric@...il.com>
Subject: Re: [PATCH v2 20/28] KVM: x86/mmu: Use atomic ops to set SPTEs in TDP
 MMU map

On Thu, Apr 1, 2021 at 3:32 AM Paolo Bonzini <pbonzini@...hat.com> wrote:
>
> On 02/02/21 19:57, Ben Gardon wrote:
> > @@ -720,7 +790,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
> >                */
> >               if (is_shadow_present_pte(iter.old_spte) &&
> >                   is_large_pte(iter.old_spte)) {
> > -                     tdp_mmu_set_spte(vcpu->kvm, &iter, 0);
> > +                     if (!tdp_mmu_set_spte_atomic(vcpu->kvm, &iter, 0))
> > +                             break;
> >
> >                       kvm_flush_remote_tlbs_with_address(vcpu->kvm, iter.gfn,
> >                                       KVM_PAGES_PER_HPAGE(iter.level));
> >
> >                       /*
> >                        * The iter must explicitly re-read the spte here
> >                        * because the new value informs the !present
> >                          * path below.
> >                          */
> >                         iter.old_spte = READ_ONCE(*rcu_dereference(iter.sptep));
> >                 }
> >
> >                 if (!is_shadow_present_pte(iter.old_spte)) {
>
> Would it be easier to reason about this code by making it retry, like:
>
> retry:
>                  if (is_shadow_present_pte(iter.old_spte)) {
>                         if (is_large_pte(iter.old_spte)) {
>                                 if (!tdp_mmu_zap_spte_atomic(vcpu->kvm, &iter))
>                                         break;
>
>                                 /*
>                                  * The iter must explicitly re-read the SPTE because
>                                  * the atomic cmpxchg failed.
>                                  */
>                                 iter.old_spte = READ_ONCE(*rcu_dereference(iter.sptep));
>                                 goto retry;
>                         }
>                  } else {
>                         ...
>                 }
>
> ?

To be honest, that feels less readable to me. For me retry implies
that we failed to make progress and need to repeat an operation, but
the reality is that we did make progress and there are just multiple
steps to replace the large SPTE with a child PT.
Another option which could improve readability and performance would
be to use the retry to repeat failed cmpxchgs instead of breaking out
of the loop. Then we could avoid retrying the page fault each time a
cmpxchg failed, which may happen a lot as vCPUs allocate intermediate
page tables on boot. (Probably less common for leaf entries, but
possibly useful there too.)
Another-nother option would be to remove this two part process by
eagerly splitting large page mappings in a single step. This would
substantially reduce the number of page faults incurred for NX
splitting / dirty logging splitting. It's been on our list of features
to send upstream for a while and I hope we'll be able to get it into
shape and send it out reasonably soon.

>
> Paolo
>