linux-kernel - Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level interrupt resampling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1520822024.2985.12.camel@hxt-semitech.com>
Date:   Mon, 12 Mar 2018 02:33:44 +0000
From:   "Yang, Shunyong" <shunyong.yang@...-semitech.com>
To:     "marc.zyngier@....com" <marc.zyngier@....com>,
        "cdall@...nel.org" <cdall@...nel.org>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "ard.biesheuvel@...aro.org" <ard.biesheuvel@...aro.org>,
        "kvmarm@...ts.cs.columbia.edu" <kvmarm@...ts.cs.columbia.edu>,
        "Zheng, Joey" <yu.zheng@...-semitech.com>,
        "will.deacon@....com" <will.deacon@....com>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        "david.daney@...ium.com" <david.daney@...ium.com>,
        "eric.auger@...hat.com" <eric.auger@...hat.com>
Subject: Re: [RFC PATCH] KVM: arm/arm64: vgic: change condition for level
 interrupt resampling

Hi, Marc,

On Sun, 2018-03-11 at 12:17 +0000, Marc Zyngier wrote:
> On Sun, 11 Mar 2018 01:55:08 +0000
> Christoffer Dall <cdall@...nel.org> wrote:
> 
> > 
> > On Sat, Mar 10, 2018 at 12:20 PM, Marc Zyngier <marc.zyngier@....co
> > m> wrote:
> > > 
> > > On Fri, 09 Mar 2018 21:36:12 +0000,
> > > Christoffer Dall wrote:  
> > > > 
> > > > 
> > > > On Thu, Mar 08, 2018 at 05:28:44PM +0000, Marc Zyngier wrote:  
> > > > > 
> > > > > I'd be more confident if we did forbid P+A for such
> > > > > interrupts
> > > > > altogether, as they really feel like another kind of HW
> > > > > interrupt.  
> > > > How about a slightly bigger hammer:  Can we avoid doing P+A for
> > > > level
> > > > interrupts completely?  I don't think that really makes much
> > > > sense, and
> > > > I think we simply everything if we just come back out and
> > > > resample the
> > > > line.  For an edge, something like a network card, there's a
> > > > potential
> > > > performance win to appending a new pending state, but I doubt
> > > > that this
> > > > is the case for level interrupts.  
> > > I started implementing the same thing yesterday. Somehow, it
> > > feels
> > > slightly better to have the same flow for all level interrupts,
> > > including the timer, and we only use the MI on EOI as a way to
> > > trigger
> > > the next state of injection. Still testing, but looking good so
> > > far.
> > > 
> > > I'm still puzzled that we have this level-but-not-quite behaviour
> > > for
> > > VFIO interrupts. At some point, it is going to bite us badly.
> > >  
> > Where is the departure from level-triggered behavior with VFIO?  As
> > far as I can tell, the GIC flow of the interrupts will be just a
> > level
> > interrupt, 
> The GIC is fine, I believe. What is not exactly fine is the
> signalling
> from the device, which will never be dropped until the EOI has been
> detected.
> 
> > 
> > but we just need to make sure the resamplefd mechanism is
> > supported for both types of interrupts.  Whether or not that's a
> > decent mechanism seems orthogonal to me, but that's a discussion
> > for
> > another day I think.
> Given that VFIO is built around this mechanism, I don't think we have
> a
> choice but to support it. Anyway, I came up with the following patch,
> which I tested on Seattle with mtty. It also survived my usual
> hammering of cyclictest, hackbench  and bulk VM installs.
> 
> Shunyong, could you please give it a go?
> 
> Thanks,
> 
> 	M.
> 

I have tested the patch. It works on QDF2400 platform
and kvm_notify_acked_irq() is called when state is idle.

BTW, I have following questions when I was debugging the issue.
Coud you please give me some help?
1)what does "mi" mean in gic code? such as lr_signals_eoi_mi();
2)In some __hyp_text code where printk() will cause "HYP panic:", such
as in __kvm_vcpu_run(). How can I output debug information?

Thanks.
Shunyong.


> From 9ca96b9fb535cc6ab578bda85c4ecbc4a8c63cd7 Mon Sep 17 00:00:00
> 2001
> From: Marc Zyngier <marc.zyngier@....com>
> Date: Fri, 9 Mar 2018 14:59:40 +0000
> Subject: [PATCH] KVM: arm/arm64: vgic: Disallow Active+Pending for
> level
>  interrupts
> 
> It was recently reported that VFIO mediated devices, and anything
> that VFIO exposes as level interrupts, do no strictly follow the
> expected logic of such interrupts as it only lowers the input
> line when the guest has EOId the interrupt at the GIC level, rather
> than when it Acked the interrupt at the device level.
> 
> The GIC's Active+Pending state is fundamentally incompatible with
> this behaviour, as it prevents KVM from observing the EOI, and in
> turn results in VFIO never dropping the line. This results in an
> interrupt storm in the guest, which it really never expected.
> 
> As we cannot really change VFIO to follow the strict rules of level
> signalling, let's forbid the A+P state altogether, as it is in the
> end only an optimization. It ensures that we will transition via
> an invalid state, which we can use to notify VFIO of the EOI.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@....com>
> ---
>  virt/kvm/arm/vgic/vgic-v2.c | 47 +++++++++++++++++++++++++++------
> ------------
>  virt/kvm/arm/vgic/vgic-v3.c | 47 +++++++++++++++++++++++++++------
> ------------
>  2 files changed, 56 insertions(+), 38 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-
> v2.c
> index 29556f71b691..9356d749da1d 100644
> --- a/virt/kvm/arm/vgic/vgic-v2.c
> +++ b/virt/kvm/arm/vgic/vgic-v2.c
> @@ -153,8 +153,35 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu
> *vcpu)
>  void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq
> *irq, int lr)
>  {
>  	u32 val = irq->intid;
> +	bool allow_pending = true;
>  
> -	if (irq_is_pending(irq)) {
> +	if (irq->active)
> +		val |= GICH_LR_ACTIVE_BIT;
> +
> +	if (irq->hw) {
> +		val |= GICH_LR_HW;
> +		val |= irq->hwintid << GICH_LR_PHYSID_CPUID_SHIFT;
> +		/*
> +		 * Never set pending+active on a HW interrupt, as
> the
> +		 * pending state is kept at the physical distributor
> +		 * level.
> +		 */
> +		if (irq->active)
> +			allow_pending = false;
> +	} else {
> +		if (irq->config == VGIC_CONFIG_LEVEL) {
> +			val |= GICH_LR_EOI;
> +
> +			/*
> +			 * Software resampling doesn't work very
> well
> +			 * if we allow P+A, so let's not do that.
> +			 */
> +			if (irq->active)
> +				allow_pending = false;
> +		}
> +	}
> +
> +	if (allow_pending && irq_is_pending(irq)) {
>  		val |= GICH_LR_PENDING_BIT;
>  
>  		if (irq->config == VGIC_CONFIG_EDGE)
> @@ -171,24 +198,6 @@ void vgic_v2_populate_lr(struct kvm_vcpu *vcpu,
> struct vgic_irq *irq, int lr)
>  		}
>  	}
>  
> -	if (irq->active)
> -		val |= GICH_LR_ACTIVE_BIT;
> -
> -	if (irq->hw) {
> -		val |= GICH_LR_HW;
> -		val |= irq->hwintid << GICH_LR_PHYSID_CPUID_SHIFT;
> -		/*
> -		 * Never set pending+active on a HW interrupt, as
> the
> -		 * pending state is kept at the physical distributor
> -		 * level.
> -		 */
> -		if (irq->active && irq_is_pending(irq))
> -			val &= ~GICH_LR_PENDING_BIT;
> -	} else {
> -		if (irq->config == VGIC_CONFIG_LEVEL)
> -			val |= GICH_LR_EOI;
> -	}
> -
>  	/*
>  	 * Level-triggered mapped IRQs are special because we only
> observe
>  	 * rising edges as input to the VGIC.  We therefore lower
> the line
> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-
> v3.c
> index 0ff2006f3781..6b484575cafb 100644
> --- a/virt/kvm/arm/vgic/vgic-v3.c
> +++ b/virt/kvm/arm/vgic/vgic-v3.c
> @@ -135,8 +135,35 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu,
> struct vgic_irq *irq, int lr)
>  {
>  	u32 model = vcpu->kvm->arch.vgic.vgic_model;
>  	u64 val = irq->intid;
> +	bool allow_pending = true;
>  
> -	if (irq_is_pending(irq)) {
> +	if (irq->active)
> +		val |= ICH_LR_ACTIVE_BIT;
> +
> +	if (irq->hw) {
> +		val |= ICH_LR_HW;
> +		val |= ((u64)irq->hwintid) << ICH_LR_PHYS_ID_SHIFT;
> +		/*
> +		 * Never set pending+active on a HW interrupt, as
> the
> +		 * pending state is kept at the physical distributor
> +		 * level.
> +		 */
> +		if (irq->active)
> +			allow_pending = false;
> +	} else {
> +		if (irq->config == VGIC_CONFIG_LEVEL) {
> +			val |= ICH_LR_EOI;
> +
> +			/*
> +			 * Software resampling doesn't work very
> well
> +			 * if we allow P+A, so let's not do that.
> +			 */
> +			if (irq->active)
> +				allow_pending = false;
> +		}
> +	}
> +
> +	if (allow_pending && irq_is_pending(irq)) {
>  		val |= ICH_LR_PENDING_BIT;
>  
>  		if (irq->config == VGIC_CONFIG_EDGE)
> @@ -154,24 +181,6 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu,
> struct vgic_irq *irq, int lr)
>  		}
>  	}
>  
> -	if (irq->active)
> -		val |= ICH_LR_ACTIVE_BIT;
> -
> -	if (irq->hw) {
> -		val |= ICH_LR_HW;
> -		val |= ((u64)irq->hwintid) << ICH_LR_PHYS_ID_SHIFT;
> -		/*
> -		 * Never set pending+active on a HW interrupt, as
> the
> -		 * pending state is kept at the physical distributor
> -		 * level.
> -		 */
> -		if (irq->active && irq_is_pending(irq))
> -			val &= ~ICH_LR_PENDING_BIT;
> -	} else {
> -		if (irq->config == VGIC_CONFIG_LEVEL)
> -			val |= ICH_LR_EOI;
> -	}
> -
>  	/*
>  	 * Level-triggered mapped IRQs are special because we only
> observe
>  	 * rising edges as input to the VGIC.  We therefore lower
> the line
> -- 
> 2.14.2
> 
>