linux-kernel - Re: [PATCH 1/2] KVM: x86/mmu: Split NX hugepage recovery flow into TDP and non-TDP flow

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALzav=cFPduBR4pmgnVrgY6q+wufTn_nS-4QDF4yw8uGQkV41Q@mail.gmail.com>
Date: Mon, 19 Aug 2024 10:28:34 -0700
From: David Matlack <dmatlack@...gle.com>
To: Vipin Sharma <vipinsh@...gle.com>
Cc: Sean Christopherson <seanjc@...gle.com>, pbonzini@...hat.com, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] KVM: x86/mmu: Split NX hugepage recovery flow into
 TDP and non-TDP flow

On Mon, Aug 19, 2024 at 10:20 AM Vipin Sharma <vipinsh@...gle.com> wrote:
>
> On 2024-08-16 16:29:11, Sean Christopherson wrote:
> > On Mon, Aug 12, 2024, Vipin Sharma wrote:
> > > +   list_for_each_entry(sp, &kvm->arch.possible_nx_huge_pages, possible_nx_huge_page_link) {
> > > +           if (i++ >= max)
> > > +                   break;
> > > +           if (is_tdp_mmu_page(sp) == tdp_mmu)
> > > +                   return sp;
> > > +   }
> >
> > This is silly and wasteful.  E.g. in the (unlikely) case there's one TDP MMU
> > page amongst hundreds/thousands of shadow MMU pages, this will walk the list
> > until @max, and then move on to the shadow MMU.
> >
> > Why not just use separate lists?
>
> Before this patch, NX huge page recovery calculates "to_zap" and then it
> zaps first "to_zap" pages from the common list. This series is trying to
> maintain that invarient.
>
> If we use two separate lists then we have to decide how many pages
> should be zapped from TDP MMU and shadow MMU list. Few options I can
> think of:
>
> 1. Zap "to_zap" pages from both TDP MMU and shadow MMU list separately.
>    Effectively, this might double the work for recovery thread.
> 2. Try zapping "to_zap" page from one list and if there are not enough
>    pages to zap then zap from the other list. This can cause starvation.
> 3. Do half of "to_zap" from one list and another half from the other
>    list. This can lead to situations where only half work is being done
>    by the recovery worker thread.
>
> Option (1) above seems more reasonable to me.

I vote each should zap 1/nx_huge_pages_recovery_ratio of their
respective list. i.e. Calculate to_zap separately for each list.