linux-kernel - Re: [PATCH v3 0/4] Allow AVIC's IPI virtualization to be optional

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Zxb4D_JCC-L7OQDT@google.com>
Date: Mon, 21 Oct 2024 17:55:43 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Maxim Levitsky <mlevitsk@...hat.com>
Cc: kvm@...r.kernel.org, Will Deacon <will@...nel.org>, linux-kernel@...r.kernel.org, 
	Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org, 
	Ingo Molnar <mingo@...hat.com>, "H. Peter Anvin" <hpa@...or.com>, Thomas Gleixner <tglx@...utronix.de>, 
	Joerg Roedel <joro@...tes.org>, Suravee Suthikulpanit <suravee.suthikulpanit@....com>, 
	Robin Murphy <robin.murphy@....com>, iommu@...ts.linux.dev, 
	Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [PATCH v3 0/4] Allow AVIC's IPI virtualization to be optional

Finally got back to this...

On Wed, Oct 04, 2023, Maxim Levitsky wrote:
> У пн, 2023-10-02 у 12:21 -0700, Sean Christopherson пише:
> > On Mon, Oct 02, 2023, Maxim Levitsky wrote:
> > > Hi!
> > > 
> > > This patch allows AVIC's ICR emulation to be optional and thus allows
> > > to workaround AVIC's errata #1235 by disabling this portion of the feature.
> > > 
> > > This is v3 of my patch series 'AVIC bugfixes and workarounds' including
> > > review feedback.
> > 
> > Please respond to my idea[*] instead of sending more patches. 
> 
> Hi,
> 
> For the v2 of the patch I was already on the fence if to do it this way or to
> refactor the code, and back when I posted it, I decided still to avoid the
> refactoring.
> 
> However, your idea of rewriting this patch, while it does change less lines
> of code, is even less obvious and consequently required you to write even
> longer comment to justify it which is not a good sign.

Agreed.  And FWIW, if we keep the local "entry" variables, I actually like your
version much better.

> In particular I don't want someone to find out later, and in the hard way
> that sometimes real physid table is accessed, and sometimes a fake copy of it
> is.

Note, this quirk is present in your v2 as well.  I bring that up only because I
prefer your v2, and rebased it (with massaging) on top of the next version of the
max vCPUs.  This is what I have currently:

-       WRITE_ONCE(kvm_svm->avic_physical_id_table[vcpu->vcpu_id], entry);
+       svm->avic_physical_id_entry = entry;
+
+       /*
+        * If IPI virtualization is disable, don't update the actual Physical
+        * ID table, so that the CPU never sees IsRunning=1.
+        */
+       if (enable_ipiv)
+               WRITE_ONCE(kvm_svm->avic_physical_id_table[vcpu->vcpu_id], entry);
+

I have no objection to writing the "real" table, but with IsRunning=0.  That's
easy enough to do, e.g. tweak the above to:

	/*
	 * If IPI virtualization is disabled, clear IsRunning when updating the
	 * actual Physical ID table, so that the CPU never sees IsRunning=1.
	 */
	if (!enable_ipiv)
		entry &= ~AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK;

	WRITE_ONCE(kvm_svm->avic_physical_id_table[vcpu->vcpu_id], entry);

> So I decided to fix the root cause by not reading the physid table back,
> which made the code cleaner, and even with the workaround the code IMHO is
> still simpler than it was before.
> 
> About the added 'vcpu->loaded' variable, I added it also because it is
> something that is long overdue to be added, I remember that in IPIv code
> there was also a need for this, and probalby more places in KVM can be
> refactored to take advantage of it, instead of various hacks.

I don't view using the information from the Physical ID table as a hack.  It very
explicitly uses the ir_list_lock to ensure that the pCPU that's programmed into
the IRTE is the pCPU on which the vCPU is loaded, and provides rather strict
ordering between task migration and device assignment.  It's not a super hot path,
so I don't think lockless programming is justified.

I also think we should keep IsRunning=1 when the vCPU is unloaded.  That approach
won't run afoul of your concern with signaling the wrong pCPU, because KVM can
still keep the ID up-to-date, e.g. if the task is migrated when a pCPU is being
offlined.

The motiviation for keeping IsRunning=1 is to avoid unnecessary VM-Exits and GA
log IRQs.  E.g. if a vCPU exits to userspace, there's zero reason to force IPI
senders to exit, because KVM can't/won't notify userspace, and the pending virtual
interrupt will be processed on the next VMRUN.

> I did adopt your idea of using 'enable_ipiv', although I am still not 100%
> sure that this is more readable than 'avic_zen2_workaround'.

The problem with "avic_zen2_workaround" is that it becomes nonsensical if a user
wants to disable IPI virtualization on a CPU that isn't affected by the erratum.
And I also think KVM should disallow enabling IPI virtualization if the CPU is
affected by the erratum; even with HLT-exiting disabled on every VM, kvm_vcpu_block()
is still reachable, so I don't think it's at all reasonable to expect a user to
be able to know when it's safe to ignore the erratum.