linux-kernel - Re: [PATCH v2 01/14] KVM: x86: change PIT discard tick policy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160225173615.GD18319@potion.redhat.com>
Date:	Thu, 25 Feb 2016 18:36:16 +0100
From:	Radim Krčmář <rkrcmar@...hat.com>
To:	Paolo Bonzini <pbonzini@...hat.com>
Cc:	linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
	Yuki Shibuya <shibuya.yk@...s.nec.co.jp>,
	Rik van Riel <riel@...hat.com>,
	Peter Krempa <pkrempa@...hat.com>
Subject: Re: [PATCH v2 01/14] KVM: x86: change PIT discard tick policy

I think I understand your definitions of tick policies and I remain
convinced that libvirt defines them differently.

Would you prefer that I got a confirmation on the libvirt list before
continuing the discussion here?

Two relevant parts of my reply that don't depend on libvirt's reply are
marked with ** .

2016-02-25 14:38+0100, Paolo Bonzini:
> On 19/02/2016 15:44, Radim Krčmář wrote:
>>> The resulting injections are:
>>>> - for catchup, which QEMU calls slew: 0, 42, 51, 60, 80.
> 
> I think we agree that 0, 42, 43, 60, 80 is also a "catchup"/"slew"
> policy.

We do.  (Libvirt "catchup" is also QEMU "delay".)

>          So we can change QEMU's kvm-i8254 to accept "slew" and warn if
> "delay" is given.
**
QEMU 4e4fa398db69 ("qdev: Introduce lost tick policy property") defines:

  delay   - replay all lost ticks in a row once the guest accepts them
            again
  slew    - lost ticks are gradually replayed at a higher frequency than
            the original tick

"delay" is exactly how kvm-i8254 behaves (in its "reinject" mode), so I
think we shouldn't change it.

> In fact "slew" means "a large number or quantity of something" and
> indeed that's a good word to characterize kvm-i8254's reinjection behavior.

(Isn't it a verb, with a similar meaning as "drift"? ;])

>>>> - for merge: 0, 20 (in IRR, delivered at 42), 60, 80.
>>>>
>>>> For delay I *think* it would be 0, 42, 62, 82, 102.
>> 
>> I could call this "delay".
>> 
>>   Continue to deliver ticks at the normal rate. The guest time will be
>>   delayed due to the late tick
>> 
>> At 82 time units, the guest thinks it's 60, so the guest will do
>> everything late.  (Leading us to call it delayed?!)
> 
> Yup, this was my reasoning.
> 
>> Few examples of "delay" that I find easier to accept:
>>  0, 60, 80.
> 
> This is "discard".

At 80, the guest thinks that the time is 40, so every action it does
will still be delayed.  I don't see why it isn't libvirt "delay":
 - ticks are delivered at the normal rate
 - guest time is delayed

(The original RFC that Peter shared called this policy "none".)

I don't think it is libvirt "discard":
 - missed ticks were thrown away
 - future injection continues normally

which is fine, but
 - the guest time is delayed, because there isn't a way to handle lost
   ticks

and this is incompatible with libvirt's definition of "discard"

  The guest time may be delayed, unless the OS has explicit handling of
  lost ticks.

"may" doesn't fit.  You can only say
 - the guest time is delayed.

which is best described by "delay".

>>  0, 42, 60, 80.  Because we haven't missed the tick at 20, it just took
>>                  a while to be delivered.  (Semantics ...)
> 
> This is not discard.  The ideal implementation of the tick policies is
> that the timer devices enjoy information from the interrupt controller,
> that lets them know when a tick cannot be delivered.  In that case they
> do stuff like:

Yes.  The case where the timer doesn't know about missed ticks also
deserves a tick policy and I think that libvirt calls it "delay", QEMU
"discard".

> - save it for later (catchup)
> 
> - drop it for good (which is discard, and not the same as stashing it in
> IRR)
> 
> - pause the timer (delay)
> 
> - coalesce the ticks into one late tick (merge)
> 
>> Terminlogy does suck. (Maybe it stems from the fact that QEMU talks
>> about lost ticks, but libvirt about ticks?)
>> Nevertheless, I don't think that libvirt "merge" covers what PIT does in
>> KVM or real hardware.
>> 
>>   Merge the missed tick(s) into one tick and inject. The guest time may
>>   be delayed, depending on how the OS reacts to the merging of ticks
>> 
>> No merging is happening in KVM or real hardware:  every tick is exactly
>> one tick, so the guest cannot tell that we missed some ticks and the
>> time is delayed.  If a tick made it into clear IRR, it's not missed.
> 
> The libvirt documentation is written from the point of view of the
> guest, ignoring whether the late tick is recorded in some guest-visible
> register (IRR) or in the processor state (as is the case for NMI) or in
> the timer device state or who knows where.

Yes.

> Therefore, it _also_ happens that thanks to IRR and NMI latching you can
> implement "merge" without having that kind of relationship between the
> timer device and the interrupt controller.

I disagree.  IRR can catch at most one interrupt, so it is insufficient
to implement libvirt's merge.  (libvirt's merge also has the conditional
"The guest time may be delayed".)

>                                             But libvirt doesn't care,
> all the <timer> element wants is the guest-visible behavior in terms of
> when the ticks are delivered.

Yes, and the guest can't behave like the libvirt definition says.

Libvirt mainly cares if guest's time was irreversibly delayed or not and
for how long can the time be delayed.

(3 of 4 policies in libvirt don't need to delay the time.
 Only 1 policy in QEMU lets the guest recover time.  2 if you say that
 libvirt "merge" = QEMU "merge" and keep pit as QEMU "discard".)

> This was my reasoning when proposing to change QEMU regarding "discard"
> as well:
> 
> - RTC would warn for "discard" and accept "merge"
> 
> - kvm-i8254 would warn for "discard", and accept "merge" if the
> capability says that your patch is in.  The idea is that "discard" is
> such a bad idea, that "merge" should fail if the hypervisor would cause
> the watchdog to hang.
**
I think it's better to say that the tick in IRR wasn't lost and policy
is "discard".
  discard - ignore lost ticks (e.g. if the guest compensates for them
            already)

(We don't need to say that a tick is lost just because it couldn't be
 delivered within some interval.)

"merge" also fits the behavior of kvm-i8254 if you treat IRR as a part
of the timer policy.
  merge   - if multiple ticks are lost, all of them are merged into one
            which is replayed once the guest accepts it again

(It means that every tick was "lost" and then replayed, usually right
 after the tick was "lost" -- this is because you can be only be merging
 lost ticks, so if there could be a non-lost tick in IRR while no tick
 was being handled, e.g. disabled interrupts, and you wanted to deliver
 a next one, then you'd need to have another bit that remembers lost
 ticks.)

>> In the example:
>> 
>>>> - for merge: 0, 20 (in IRR, delivered at 42), 60, 80.
>> 
>> at 80, the guest thinks it's 60.
>> 
>> I think that merge might do:  0, 42, 60, 80.
>> But the tick at 42 is counted as two ticks (20, 40) in the guest.
> 
> Yes, merge is a good policy if the guest can somehow realize that 42
> stood for (20, 40).  Discard would be a good policy too if the guest can
> somehow realize that 60 stood for (20, 40, 60) but it has the problem
> that NMIs don't do EOIs.

Yes, recognizing lost ticks is a prerequisite for catchup/delay/discard.
Standard i8254 & friends aren't able to implement any policy other than
libvirt "delay", because there is no notion of lost ticks.
(KVM's changes to EOI notifiers were very shortsighted.)

>> The main problem of this interpretation is that discard is a subset of
>> merge:
>> 
>>>> - for discard: 0, 60, 80.
>> 
>> The tick at 60 has to be counted as three ticks (20, 40, 60).
> 
> Why is it a problem?

Defining a subset without using the superset is a major flaw in a spec.
It got clarified in the link that Peter gave, where merge delivers
"asap" while discard just waits for the next tick. :)