linux-kernel - Re: [PATCH] kvm: x86: make lapic hrtimer pinned

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANRm+CxYAMch4TtP6iUrYVZiQ2yZm25La9g7z+ukXquviQL8_g@mail.gmail.com>
Date:	Fri, 22 Apr 2016 07:12:51 +0800
From:	Wanpeng Li <kernellwp@...il.com>
To:	Luiz Capitulino <lcapitulino@...hat.com>
Cc:	Yang Zhang <yang.zhang.wz@...il.com>,
	Rik van Riel <riel@...hat.com>, kvm <kvm@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Paolo Bonzini <pbonzini@...hat.com>,
	Radim Krcmar <rkrcmar@...hat.com>,
	Marcelo Tosatti <mtosatti@...hat.com>,
	Bandan Das <bsd@...hat.com>
Subject: Re: [PATCH] kvm: x86: make lapic hrtimer pinned

2016-04-05 20:40 GMT+08:00 Luiz Capitulino <lcapitulino@...hat.com>:
> On Tue, 5 Apr 2016 14:18:01 +0800
> Yang Zhang <yang.zhang.wz@...il.com> wrote:
>
>> On 2016/4/5 5:00, Rik van Riel wrote:
>> > On Mon, 2016-04-04 at 16:46 -0400, Luiz Capitulino wrote:
>> >> When a vCPU runs on a nohz_full core, the hrtimer used by
>> >> the lapic emulation code can be migrated to another core.
>> >> When this happens, it's possible to observe milisecond
>> >> latency when delivering timer IRQs to KVM guests.
>> >>
>> >> The huge latency is mainly due to the fact that
>> >> apic_timer_fn() expects to run during a kvm exit. It
>> >> sets KVM_REQ_PENDING_TIMER and let it be handled on kvm
>> >> entry. However, if the timer fires on a different core,
>> >> we have to wait until the next kvm exit for the guest
>> >> to see KVM_REQ_PENDING_TIMER set.
>> >>
>> >> This problem became visible after commit 9642d18ee. This
>> >> commit changed the timer migration code to always attempt
>> >> to migrate timers away from nohz_full cores. While it's
>> >> discussable if this is correct/desirable (I don't think
>> >> it is), it's clear that the lapic emulation code has
>> >> a requirement on firing the hrtimer in the same core
>> >> where it was started. This is achieved by making the
>> >> hrtimer pinned.
>> >
>> > Given that delivering a timer to a guest seems to
>> > involve trapping from the guest to the host, anyway,
>> > I don't see a downside to your patch.
>> >
>> > If that is ever changed (eg. allowing delivery of
>> > a timer interrupt to a VCPU without trapping to the
>> > host), we may want to revisit this.
>>
>>
>> Posted interrupt helps in this case. Currently, KVM doesn't use PI for
>> lapic timer is due to same affinity for lapic timer and VCPU. Now, we
>> can change to use PI for lapic timer. The only concern is what's
>> frequency of timer migration in upstream Linux? If it is frequently,
>> will it bring additional cost?
>
> I can't answer this questions.
>
>> BTW, in what case the migration of timers during VCPU scheduling will fail?
>
> For hrtimers (which is the lapic emulation case), it only succeeds if
> the destination core has a hrtimer expiring before the hrtimer being
> migrated.

Interesting, did you figure out why this happen? Actually the clock
event device will be reprogrammed if the expire time of the new
enqueued hrtimer is earlier than the left most(earliest expire time)
hrtimer in hrtimer rb tree.

Regards,
Wanpeng Li

>
> Also, if the hrtimer callback function is already running (that is,
> the timer fired already) it's not migrated either. But I _guess_ this
> case doesn't affect KVM (and there's no much do about it anyways).