[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5678F0BE.2020409@gmail.com>
Date: Tue, 22 Dec 2015 14:42:06 +0800
From: Yang Zhang <yang.zhang.wz@...il.com>
To: "Wu, Feng" <feng.wu@...el.com>,
"pbonzini@...hat.com" <pbonzini@...hat.com>,
"rkrcmar@...hat.com" <rkrcmar@...hat.com>
Cc: "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d
posted-interrupts
On 2015/12/22 12:36, Wu, Feng wrote:
>
>
>> -----Original Message-----
>> From: Yang Zhang [mailto:yang.zhang.wz@...il.com]
>> Sent: Monday, December 21, 2015 10:01 AM
>> To: Wu, Feng <feng.wu@...el.com>; pbonzini@...hat.com;
>> rkrcmar@...hat.com
>> Cc: kvm@...r.kernel.org; linux-kernel@...r.kernel.org
>> Subject: Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d
>> posted-interrupts
>>
>> On 2015/12/21 9:55, Wu, Feng wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: linux-kernel-owner@...r.kernel.org [mailto:linux-kernel-
>>>> owner@...r.kernel.org] On Behalf Of Yang Zhang
>>>> Sent: Monday, December 21, 2015 9:50 AM
>>>> To: Wu, Feng <feng.wu@...el.com>; pbonzini@...hat.com;
>>>> rkrcmar@...hat.com
>>>> Cc: kvm@...r.kernel.org; linux-kernel@...r.kernel.org
>>>> Subject: Re: [PATCH v2 2/2] KVM: x86: Add lowest-priority support for vt-d
>>>> posted-interrupts
>>>>
>>>> On 2015/12/16 9:37, Feng Wu wrote:
>>>>> Use vector-hashing to deliver lowest-priority interrupts for
>>>>> VT-d posted-interrupts.
>>>>>
>>>>> Signed-off-by: Feng Wu <feng.wu@...el.com>
>>>>> ---
>>>>> arch/x86/kvm/lapic.c | 67
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> arch/x86/kvm/lapic.h | 2 ++
>>>>> arch/x86/kvm/vmx.c | 12 ++++++++--
>>>>> 3 files changed, 79 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>>>> index e29001f..d4f2c8f 100644
>>>>> --- a/arch/x86/kvm/lapic.c
>>>>> +++ b/arch/x86/kvm/lapic.c
>>>>> @@ -854,6 +854,73 @@ out:
>>>>> }
>>>>>
>>>>> /*
>>>>> + * This routine handles lowest-priority interrupts using vector-hashing
>>>>> + * mechanism. As an example, modern Intel CPUs use this method to
>> handle
>>>>> + * lowest-priority interrupts.
>>>>> + *
>>>>> + * Here is the details about the vector-hashing mechanism:
>>>>> + * 1. For lowest-priority interrupts, store all the possible destination
>>>>> + * vCPUs in an array.
>>>>> + * 2. Use "guest vector % max number of destination vCPUs" to find the
>> right
>>>>> + * destination vCPU in the array for the lowest-priority interrupt.
>>>>> + */
>>>>> +struct kvm_vcpu *kvm_intr_vector_hashing_dest(struct kvm *kvm,
>>>>> + struct kvm_lapic_irq *irq)
>>>>> +{
>>>>> + struct kvm_apic_map *map;
>>>>> + struct kvm_vcpu *vcpu = NULL;
>>>>> +
>>>>> + if (irq->shorthand)
>>>>> + return NULL;
>>>>> +
>>>>> + rcu_read_lock();
>>>>> + map = rcu_dereference(kvm->arch.apic_map);
>>>>> +
>>>>> + if (!map)
>>>>> + goto out;
>>>>> +
>>>>> + if ((irq->dest_mode != APIC_DEST_PHYSICAL) &&
>>>>> + kvm_lowest_prio_delivery(irq)) {
>>>>> + u16 cid;
>>>>> + int i, idx = 0;
>>>>> + unsigned long bitmap = 1;
>>>>> + unsigned int dest_vcpus = 0;
>>>>> + struct kvm_lapic **dst = NULL;
>>>>> +
>>>>> +
>>>>> + if (!kvm_apic_logical_map_valid(map))
>>>>> + goto out;
>>>>> +
>>>>> + apic_logical_id(map, irq->dest_id, &cid, (u16 *)&bitmap);
>>>>> +
>>>>> + if (cid >= ARRAY_SIZE(map->logical_map))
>>>>> + goto out;
>>>>> +
>>>>> + dst = map->logical_map[cid];
>>>>> +
>>>>> + for_each_set_bit(i, &bitmap, 16) {
>>>>> + if (!dst[i] && !kvm_lapic_enabled(dst[i]->vcpu)) {
>>>>> + clear_bit(i, &bitmap);
>>>>> + continue;
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + dest_vcpus = hweight16(bitmap);
>>>>> +
>>>>> + if (dest_vcpus != 0) {
>>>>> + idx = kvm_vector_2_index(irq->vector, dest_vcpus,
>>>>> + &bitmap, 16);
>>>>> + vcpu = dst[idx-1]->vcpu;
>>>>> + }
>>>>> + }
>>>>> +
>>>>> +out:
>>>>> + rcu_read_unlock();
>>>>> + return vcpu;
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(kvm_intr_vector_hashing_dest);
>>>>> +
>>>>> +/*
>>>>> * Add a pending IRQ into lapic.
>>>>> * Return 1 if successfully added and 0 if discarded.
>>>>> */
>>>>> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
>>>>> index 6890ef0..52bffce 100644
>>>>> --- a/arch/x86/kvm/lapic.h
>>>>> +++ b/arch/x86/kvm/lapic.h
>>>>> @@ -172,4 +172,6 @@ bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm,
>>>> struct kvm_lapic_irq *irq,
>>>>> struct kvm_vcpu **dest_vcpu);
>>>>> int kvm_vector_2_index(u32 vector, u32 dest_vcpus,
>>>>> const unsigned long *bitmap, u32 bitmap_size);
>>>>> +struct kvm_vcpu *kvm_intr_vector_hashing_dest(struct kvm *kvm,
>>>>> + struct kvm_lapic_irq *irq);
>>>>> #endif
>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>>> index 5eb56ed..3f89189 100644
>>>>> --- a/arch/x86/kvm/vmx.c
>>>>> +++ b/arch/x86/kvm/vmx.c
>>>>> @@ -10702,8 +10702,16 @@ static int vmx_update_pi_irte(struct kvm
>> *kvm,
>>>> unsigned int host_irq,
>>>>> */
>>>>>
>>>>> kvm_set_msi_irq(e, &irq);
>>>>> - if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
>>>>> - continue;
>>>>> +
>>>>> + if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) {
>>>>> + if (!kvm_vector_hashing_enabled() ||
>>>>> + irq.delivery_mode !=
>>>> APIC_DM_LOWEST)
>>>>> + continue;
>>>>> +
>>>>> + vcpu = kvm_intr_vector_hashing_dest(kvm, &irq);
>>>>> + if (!vcpu)
>>>>> + continue;
>>>>> + }
>>>>
>>>> I am a little confused with the 'continue'. If the destination is not
>>>> single vcpu, shouldn't we rollback to use non-PI mode?
>>>
>>> Here is the logic:
>>> - If it is single destination, we will use PI no matter it is fixed or lowest-priority.
>>> - If it is not single destination:
>>> a) It is fixed, we will use non-PI
>>> b) It is lowest-priority and vector-hashing is enabled, we will use PI
>>> c) otherwise, use non-PI
>>
>> If it is single destination previously, then change to no-single mode.
>> Can current code cover this case?
>
> In my test, before setting irq affinity (change single vcpu to non-single vcpu
> in this case), the guest will mask the interrupt first, so before getting here, IRTE
> has been changed back to remapped mode already(when guest masks the MSIx,
> we will change back to remapped mode), hence nothing needed here.
>
> Digging into the linux code (guest) a bit more, I found that if interrupt remapping
> is not enabled in the guest (IR is not supported for guest anyway), it will always
> mask the MSI/MSIx before setting the irq affinity. So the code should work
> well currently.
We should not rely on guest's behavior. From code level, it need be fixed.
>
> However, for robustness, I think explicitly changing IRTE back to remapped
> mode for the 'continue' case should be a good idea.
This is what i am looking for.
>
> Radim, Paolo, what are your guys' options about this? Any comments are
> appreciated! Thanks a lot!
>
> Thanks,
> Feng
>
--
best regards
yang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists