linux-kernel - Re: [PATCH 1/3] KVM: x86: VMX: __kvm_apic_update

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <f867061cdc25ee887e43ebf6164733b309ccd631.camel@redhat.com>
Date:   Tue, 18 Jul 2023 16:35:30 +0300
From:   Maxim Levitsky <mlevitsk@...hat.com>
To:     Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org
Cc:     "H. Peter Anvin" <hpa@...or.com>,
        Sean Christopherson <seanjc@...gle.com>,
        linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
        Borislav Petkov <bp@...en8.de>
Subject: Re: [PATCH 1/3] KVM: x86: VMX: __kvm_apic_update_irr must update
 the IRR atomically

У вт, 2023-07-18 у 13:41 +0200, Paolo Bonzini пише:
> On 7/18/23 11:13, Maxim Levitsky wrote:
> > +		irr_val = READ_ONCE(*((u32 *)(regs + APIC_IRR + i * 0x10)));
> 
> Let's separate out the complicated arithmetic, as it recurs below too:
> 
> 	u32 *p_irr = (u32 *)(regs + APIC_IRR + i * 0x10);

No objections at all for this change, I wanted to have a minimal patch.

> 
> > +			while (!try_cmpxchg(((u32 *)(regs + APIC_IRR + i * 0x10)),
> > +			       &irr_val, irr_val | pir_val));
> > +
> >   			prev_irr_val = irr_val;
> > -			irr_val |= xchg(&pir[i], 0);
> > -			*((u32 *)(regs + APIC_IRR + i * 0x10)) = irr_val;
> > -			if (prev_irr_val != irr_val) {
> > -				max_updated_irr =
> > -					__fls(irr_val ^ prev_irr_val) + vec;
> > -			}
> > +			irr_val |= pir_val;
> > +
> > +			if (prev_irr_val != irr_val)
> > +				max_updated_irr = __fls(irr_val ^ prev_irr_val) + vec;
> 
> We can write this a bit more cleanly too, and avoid unnecessary

To be honest as far as I see, no matter what to do with this function, it is still
a bit complicated IMHO:

The root cause of the complexity in this function is that it does two things at the same time -
copies both the new bits to IRR and also counts the max_irr.

It would be so much cleaner to first copy new bits from PIR to irr (and that can be done
with 'lock or' or even by setting each bit with atomic bit set (in this way the setting of the bits
will be pretty much the same as what other users of IRR do (set bit atomically + set irr_pending).

And then let the common code count the max_irr.

I doubt this will affect performance in any way, but I don't have a good way to measure it,
so I won't be arguing about it.

On the other hand, I am thinking now that maybe I should make the cmpxchg conditional on
apicv beeing inhibited, as otherwise it works for nothing and actually might affect performance.

This though might in theory cause a race if a sender incorrectly thinks that this's vCPU APICv is
inhibited or not.

It probalby doesn't matter as the only reason for APICv to be inhibited is that AutoEOI thing which
should happen just once when the guest boots.

I also have another idea - I can make the IPI senders still set bits in the PIR even if APICv is inhibited, 
then there is no race to worry about although then the bits will always have to be copied from PIR to IRR
(but then again APICv inhibition is rare).

> try_cmpxchg too:
> 
> prev_irr_val = irr_val;
> do
> 	irr_val = prev_irr_val | pir_val;
> while (prev_irr_val != irr_val &&
>         !try_cmpxchg(p_irr, &prev_irr_val, irr_val));
> 
> if (prev_irr_val != irr_val)
> 	max_updated_irr = __fls(irr_val ^ prev_irr_val) + vec;
> 
> If this looks okay to you, I'll queue the patches for -rc3 and also Cc 
> them for inclusion in stable kernels.

No objections for this change as well.

Best regards,
	Maxim Levitsky

> 
> Paolo
>