linux-kernel - Re: [PATCH 1/2] KVM: x86/mmu: Split NX hugepage recovery flow into TDP and non-TDP flow

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240819172023.GA2210585.vipinsh@google.com>
Date: Mon, 19 Aug 2024 10:20:23 -0700
From: Vipin Sharma <vipinsh@...gle.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: pbonzini@...hat.com, dmatlack@...gle.com, kvm@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] KVM: x86/mmu: Split NX hugepage recovery flow into
 TDP and non-TDP flow

On 2024-08-16 16:29:11, Sean Christopherson wrote:
> On Mon, Aug 12, 2024, Vipin Sharma wrote:
> > +	list_for_each_entry(sp, &kvm->arch.possible_nx_huge_pages, possible_nx_huge_page_link) {
> > +		if (i++ >= max)
> > +			break;
> > +		if (is_tdp_mmu_page(sp) == tdp_mmu)
> > +			return sp;
> > +	}
> 
> This is silly and wasteful.  E.g. in the (unlikely) case there's one TDP MMU
> page amongst hundreds/thousands of shadow MMU pages, this will walk the list
> until @max, and then move on to the shadow MMU.
> 
> Why not just use separate lists?

Before this patch, NX huge page recovery calculates "to_zap" and then it
zaps first "to_zap" pages from the common list. This series is trying to
maintain that invarient.

If we use two separate lists then we have to decide how many pages
should be zapped from TDP MMU and shadow MMU list. Few options I can
think of:

1. Zap "to_zap" pages from both TDP MMU and shadow MMU list separately.
   Effectively, this might double the work for recovery thread.
2. Try zapping "to_zap" page from one list and if there are not enough
   pages to zap then zap from the other list. This can cause starvation.
3. Do half of "to_zap" from one list and another half from the other
   list. This can lead to situations where only half work is being done
   by the recovery worker thread.

Option (1) above seems more reasonable to me.