linux-kernel - Re: [PATCH] KVM: s390: Implement CHECK_STOP support and fix GET_MP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <fd5ad2be-f15f-425f-b8ef-087dc639024d@linux.ibm.com>
Date: Tue, 25 Nov 2025 19:10:43 +0100
From: Janosch Frank <frankja@...ux.ibm.com>
To: Josephine Pfeiffer <hi@...ie.lol>, borntraeger@...ux.ibm.com
Cc: imbrenda@...ux.ibm.com, david@...nel.org, hca@...ux.ibm.com,
        gor@...ux.ibm.com, agordeev@...ux.ibm.com, svens@...ux.ibm.com,
        kvm@...r.kernel.org, linux-s390@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] KVM: s390: Implement CHECK_STOP support and fix
 GET_MP_STATE

On 11/20/25 19:28, Josephine Pfeiffer wrote:
> On Mon, 17 Nov 2025 19:14:57 +0100, Christian Borntraeger wrote:
>> Am 17.11.25 um 16:18 schrieb Josephine Pfeiffer:
>>> Add support for KVM_MP_STATE_CHECK_STOP to enable proper VM migration
>>> and error handling for s390 guests. The CHECK_STOP state represents a
>>> CPU that encountered a severe machine check and is halted in an error
>>> state.
>>
>> I think the patch description is misleading. We do have proper VM
>> migration and we also have error handling in the kvm module. The host
>> machine check handler will forward guest machine checks to the guest.
>> This logic  is certainly not perfect but kind of good enough for most
>> cases.
> 
> First of all, thank you for taking the time to look at my patch, and sorry
> for taking so long to write up the reply.
> 
> You're right, QEMU migrates cpu_state via vmstate [1] and only uses
> KVM_SET_MP_STATE to restore the state after migration [2], never calling
> KVM_GET_MP_STATE. So I misunderstood something there.
> 
> What prompted me to look into this was that the KVM API has advertised
> CHECK_STOP support without implementing it.
> Looking at commit 6352e4d2dd9a [3] from 2014: "KVM: s390: implement
> KVM_(S|G)ET_MP_STATE for user space state control"
> 
> This commit added KVM_MP_STATE_CHECK_STOP to include/uapi/linux/kvm.h [4] and
> documented it in Documentation/virtual/kvm/api.txt with:
> 
>    "KVM_MP_STATE_CHECK_STOP: the vcpu is in a special error state [s390]"
> 
> But the implementation was explicitly deferred with a fallthrough comment [3]:
> 
>    case KVM_MP_STATE_LOAD:
>    case KVM_MP_STATE_CHECK_STOP:
>        /* fall through - CHECK_STOP and LOAD are not supported yet */
>    default:
>        rc = -ENXIO;
> 
> This created a bit of an API asymmetry where:
> - Documentation/virt/kvm/api.rst:1546 [5] advertises CHECK_STOP as valid
> - KVM_SET_MP_STATE rejects it with -ENXIO
> - KVM_GET_MP_STATE never returns it (always returns STOPPED or OPERATING) [6]
> 
>> Now: The architecture defines that state and the interface is certainly
>> there. So implementing it will allow userspace to put a CPU into checkstop
>> state if you ever need that. We also have a checkstop state that you
>> can put a secure CPU in.
>>
>> The usecase is dubious though. The only case of the options from POP
>> chapter11 that makes sense to me in a virtualized environment is an exigent
>> machine check but a problem to actually deliver that (multiple reasons,
>> like the OS has machine checks disabled in PSW, or the prefix register
>> is broken).
>>
>> So I am curious, do you have any specific usecase in mind?
>> I assume you have a related QEMU patch somewhere?
> 
> The use cases I see are:
> 
> 1. API completeness: The state was added to the UAPI 11 years ago but never
>     implemented. Userspace cannot use a documented API feature.

I'd rather have stubs which properly fence than code that's never tested 
since we don't use it.

Since this never worked it might make sense to remove it since future 
users will need to check for this "feature" anyway before using it.

> 
> 2. Fault injection testing: Administrators testing failover/monitoring for
>     hardware failures could programmatically put a CPU into CHECK_STOP to
>     verify their procedures work.

How would that work?
What can we gain from putting a CPU into checkstop?
How would QEMU use this?


Checkstop is not an error communication medium, that's the machine check 
interrupt. If you want to inject faults then use the machine check 
interface.

If you want to crash the guest, then panic it or just stop cpus.