linux-kernel - Re: [PATCH] Revert "KVM: x86: Unconditionally enable irqs in guest context"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <fb41e24f-a5e3-8319-d25b-e0fe6b902a2b@redhat.com>
Date:   Tue, 12 Jan 2021 16:43:01 -0500
From:   Nitesh Narayan Lal <nitesh@...hat.com>
To:     Vitaly Kuznetsov <vkuznets@...hat.com>,
        Sean Christopherson <seanjc@...gle.com>
Cc:     linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        w90p710@...il.com, pbonzini@...hat.com,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH] Revert "KVM: x86: Unconditionally enable irqs in guest
 context"


On 1/7/21 4:33 AM, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@...gle.com> writes:
>
>> On Wed, Jan 06, 2021, Vitaly Kuznetsov wrote:
>>> Looking back, I don't quite understand why we wanted to account ticks
>>> between vmexit and exiting guest context as 'guest' in the first place;
>>> to my understanging 'guest time' is time spent within VMX non-root
>>> operation, the rest is KVM overhead (system).
>> With tick-based accounting, if the tick IRQ is received after PF_VCPU is cleared
>> then that tick will be accounted to the host/system.  The motivation for opening
>> an IRQ window after VM-Exit is to handle the case where the guest is constantly
>> exiting for a different reason _just_ before the tick arrives, e.g. if the guest
>> has its tick configured such that the guest and host ticks get synchronized
>> in a bad way.
>>
>> This is a non-issue when using CONFIG_VIRT_CPU_ACCOUNTING_GEN=y, at least with a
>> stable TSC, as the accounting happens during guest_exit_irqoff() itself.
>> Accounting might be less-than-stellar if TSC is unstable, but I don't think it
>> would be as binary of a failure as tick-based accounting.
>>
> Oh, yea, I vaguely remember we had to deal with a very similar problem
> but for userspace/kernel accounting. It was possible to observe e.g. a
> userspace task going 100% kernel while in reality it was just perfectly
> synchronized with the tick and doing a syscall just before it arrives
> (or something like that, I may be misremembering the details).
>
> So depending on the frequency, it is probably possible to e.g observe
> '100% host' with tick based accounting, the guest just has to
> synchronize exiting to KVM in a way that the tick will always arrive
> past guest_exit_irqoff().
>
> It seems to me this is a fundamental problem in case the frequency of
> guest exits can match the frequency of the time accounting tick.
>

Just to make sure that I am understanding things correctly.
There are two issues:
1. The first issue is with the tick IRQs that arrive after PF_VCPU is
   cleared as they are then accounted into the system context atleast on
   the setup where CONFIG_VIRT_CPU_ACCOUNTING_GEN is not enabled. With the
   patch "KVM: x86: Unconditionally enable irqs in guest context", we are
   atleast taking care of the scenario where the guest context is exiting
   constantly just before the arrival of the tick.
 
2. The second issue that Sean mentioned was introduced because of moving
   guest_exit_irqoff() closer to VM-exit. Due to this change, any ticks that
   happen after IRQs are disabled are incorrectly accounted into the system
   context. This is because we exit the guest context early without
   ensuring if the required guest states to handle IRQs are restored.
 
So, the increase in the system time (reported by cpuacct.stats) that I was
observing is not entirely correct after all.
Am I missing anything here?

--
Thanks
Nitesh