linux-kernel - Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d posted-interrupts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5678F0BE.2020409@gmail.com>
Date:	Tue, 22 Dec 2015 14:42:06 +0800
From:	Yang Zhang <yang.zhang.wz@...il.com>
To:	"Wu, Feng" <feng.wu@...el.com>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>,
	"rkrcmar@...hat.com" <rkrcmar@...hat.com>
Cc:	"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d
 posted-interrupts

On 2015/12/22 12:36, Wu, Feng wrote:
>
>
>> -----Original Message-----
>> From: Yang Zhang [mailto:yang.zhang.wz@...il.com]
>> Sent: Monday, December 21, 2015 10:01 AM
>> To: Wu, Feng <feng.wu@...el.com>; pbonzini@...hat.com;
>> rkrcmar@...hat.com
>> Cc: kvm@...r.kernel.org; linux-kernel@...r.kernel.org
>> Subject: Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d
>> posted-interrupts
>>
>> On 2015/12/21 9:55, Wu, Feng wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: linux-kernel-owner@...r.kernel.org [mailto:linux-kernel-
>>>> owner@...r.kernel.org] On Behalf Of Yang Zhang
>>>> Sent: Monday, December 21, 2015 9:50 AM
>>>> To: Wu, Feng <feng.wu@...el.com>; pbonzini@...hat.com;
>>>> rkrcmar@...hat.com
>>>> Cc: kvm@...r.kernel.org; linux-kernel@...r.kernel.org
>>>> Subject: Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d
>>>> posted-interrupts
>>>>
>>>> On 2015/12/16 9:37, Feng Wu wrote:
>>>>> Use vector-hashing to deliver lowest-priority interrupts for
>>>>> VT-d posted-interrupts.
>>>>>
>>>>> Signed-off-by: Feng Wu <feng.wu@...el.com>
>>>>> ---
>>>>>     arch/x86/kvm/lapic.c | 67
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>     arch/x86/kvm/lapic.h |  2 ++
>>>>>     arch/x86/kvm/vmx.c   | 12 ++++++++--
>>>>>     3 files changed, 79 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>>>> index e29001f..d4f2c8f 100644
>>>>> --- a/arch/x86/kvm/lapic.c
>>>>> +++ b/arch/x86/kvm/lapic.c
>>>>> @@ -854,6 +854,73 @@ out:
>>>>>     }
>>>>>
>>>>>     /*
>>>>> + * This routine handles lowest-priority interrupts using vector-hashing
>>>>> + * mechanism. As an example, modern Intel CPUs use this method to
>> handle
>>>>> + * lowest-priority interrupts.
>>>>> + *
>>>>> + * Here is the details about the vector-hashing mechanism:
>>>>> + * 1. For lowest-priority interrupts, store all the possible destination
>>>>> + *    vCPUs in an array.
>>>>> + * 2. Use "guest vector % max number of destination vCPUs" to find the
>> right
>>>>> + *    destination vCPU in the array for the lowest-priority interrupt.
>>>>> + */
>>>>> +struct kvm_vcpu *kvm_intr_vector_hashing_dest(struct kvm *kvm,
>>>>> +					      struct kvm_lapic_irq *irq)
>>>>> +{
>>>>> +	struct kvm_apic_map *map;
>>>>> +	struct kvm_vcpu *vcpu = NULL;
>>>>> +
>>>>> +	if (irq->shorthand)
>>>>> +		return NULL;
>>>>> +
>>>>> +	rcu_read_lock();
>>>>> +	map = rcu_dereference(kvm->arch.apic_map);
>>>>> +
>>>>> +	if (!map)
>>>>> +		goto out;
>>>>> +
>>>>> +	if ((irq->dest_mode != APIC_DEST_PHYSICAL) &&
>>>>> +			kvm_lowest_prio_delivery(irq)) {
>>>>> +		u16 cid;
>>>>> +		int i, idx = 0;
>>>>> +		unsigned long bitmap = 1;
>>>>> +		unsigned int dest_vcpus = 0;
>>>>> +		struct kvm_lapic **dst = NULL;
>>>>> +
>>>>> +
>>>>> +		if (!kvm_apic_logical_map_valid(map))
>>>>> +			goto out;
>>>>> +
>>>>> +		apic_logical_id(map, irq->dest_id, &cid, (u16 *)&bitmap);
>>>>> +
>>>>> +		if (cid >= ARRAY_SIZE(map->logical_map))
>>>>> +			goto out;
>>>>> +
>>>>> +		dst = map->logical_map[cid];
>>>>> +
>>>>> +		for_each_set_bit(i, &bitmap, 16) {
>>>>> +			if (!dst[i] && !kvm_lapic_enabled(dst[i]->vcpu)) {
>>>>> +				clear_bit(i, &bitmap);
>>>>> +				continue;
>>>>> +			}
>>>>> +		}
>>>>> +
>>>>> +		dest_vcpus = hweight16(bitmap);
>>>>> +
>>>>> +		if (dest_vcpus != 0) {
>>>>> +			idx = kvm_vector_2_index(irq->vector, dest_vcpus,
>>>>> +						 &bitmap, 16);
>>>>> +			vcpu = dst[idx-1]->vcpu;
>>>>> +		}
>>>>> +	}
>>>>> +
>>>>> +out:
>>>>> +	rcu_read_unlock();
>>>>> +	return vcpu;
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(kvm_intr_vector_hashing_dest);
>>>>> +
>>>>> +/*
>>>>>      * Add a pending IRQ into lapic.
>>>>>      * Return 1 if successfully added and 0 if discarded.
>>>>>      */
>>>>> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
>>>>> index 6890ef0..52bffce 100644
>>>>> --- a/arch/x86/kvm/lapic.h
>>>>> +++ b/arch/x86/kvm/lapic.h
>>>>> @@ -172,4 +172,6 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm,
>>>> struct kvm_lapic_irq *irq,
>>>>>     			struct kvm_vcpu **dest_vcpu);
>>>>>     int kvm_vector_2_index(u32 vector, u32 dest_vcpus,
>>>>>     		       const unsigned long *bitmap, u32 bitmap_size);
>>>>> +struct kvm_vcpu *kvm_intr_vector_hashing_dest(struct kvm *kvm,
>>>>> +					      struct kvm_lapic_irq *irq);
>>>>>     #endif
>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>>> index 5eb56ed..3f89189 100644
>>>>> --- a/arch/x86/kvm/vmx.c
>>>>> +++ b/arch/x86/kvm/vmx.c
>>>>> @@ -10702,8 +10702,16 @@ static int vmx_update_pi_irte(struct kvm
>> *kvm,
>>>> unsigned int host_irq,
>>>>>     		 */
>>>>>
>>>>>     		kvm_set_msi_irq(e, &irq);
>>>>> -		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
>>>>> -			continue;
>>>>> +
>>>>> +		if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) {
>>>>> +			if (!kvm_vector_hashing_enabled() ||
>>>>> +					irq.delivery_mode !=
>>>> APIC_DM_LOWEST)
>>>>> +				continue;
>>>>> +
>>>>> +			vcpu = kvm_intr_vector_hashing_dest(kvm, &irq);
>>>>> +			if (!vcpu)
>>>>> +				continue;
>>>>> +		}
>>>>
>>>> I am a little confused with the 'continue'. If the destination is not
>>>> single vcpu, shouldn't we rollback to use non-PI mode?
>>>
>>> Here is the logic:
>>> - If it is single destination, we will use PI no matter it is fixed or lowest-priority.
>>> - If it is not single destination:
>>> 	a) It is fixed, we will use non-PI
>>> 	b) It is lowest-priority and vector-hashing is enabled, we will use PI
>>> 	c) otherwise, use non-PI
>>
>> If it is single destination previously, then change to no-single mode.
>> Can current code cover this case?
>
> In my test, before setting irq affinity (change single vcpu to non-single vcpu
> in this case), the guest will mask the interrupt first, so before getting here, IRTE
> has been changed back to remapped mode already(when guest masks the MSIx,
> we will change back to remapped mode), hence nothing needed here.
>
> Digging into the linux code (guest) a bit more, I found that if interrupt remapping
> is not enabled in the guest (IR is not supported for guest anyway), it will always
> mask the MSI/MSIx before setting the irq affinity. So the code should work
> well currently.

We should not rely on guest's behavior. From code level, it need be fixed.

>
> However, for robustness, I think explicitly changing IRTE back to remapped
> mode for the 'continue' case should be a good idea.

This is what i am looking for.

>
> Radim, Paolo, what are your guys' options about this? Any comments are
> appreciated! Thanks a lot!
>
> Thanks,
> Feng
>


-- 
best regards
yang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/