linux-kernel - Re: [Qemu-devel] [PATCH v1 RFC 6/6] KVM: s390: add cpu model support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <537CA89B.6050803@suse.de>
Date:	Wed, 21 May 2014 15:22:35 +0200
From:	Alexander Graf <agraf@...e.de>
To:	Michael Mueller <mimu@...ux.vnet.ibm.com>
CC:	linux-s390@...r.kernel.org, kvm@...r.kernel.org,
	Gleb Natapov <gleb@...nel.org>, qemu-devel@...gnu.org,
	linux-kernel@...r.kernel.org,
	Christian Borntraeger <borntraeger@...ibm.com>,
	"Jason J. Herne" <jjherne@...ux.vnet.ibm.com>,
	Cornelia Huck <cornelia.huck@...ibm.com>,
	Paolo Bonzini <pbonzini@...hat.com>,
	Andreas Faerber <afaerber@...e.de>,
	Richard Henderson <rth@...ddle.net>
Subject: Re: [Qemu-devel] [PATCH v1 RFC 6/6] KVM: s390: add cpu model support


On 21.05.14 14:56, Michael Mueller wrote:
> On Tue, 20 May 2014 12:10:23 +0200
> Alexander Graf <agraf@...e.de> wrote:
>
>> On 20.05.14 12:02, Michael Mueller wrote:
>>> On Mon, 19 May 2014 22:14:00 +0200
>>> Alexander Graf <agraf@...e.de> wrote:
>>>
>>>> On 19.05.14 19:03, Michael Mueller wrote:
>>>>> On Mon, 19 May 2014 16:49:28 +0200
>>>>> Alexander Graf <agraf@...e.de> wrote:
>>>>>
>> [...]
>>
>>>>>>> What user and thus also user space wants depends on other factors:
>>>>>>>
>>>>>>> 1. reliability
>>>>>>> 2. performance
>>>>>>> 3. availability
>>>>>>>
>>>>>>> It's not features, that's what programmers want.
>>>>>>>
>>>>>>> That's why I have designed the model and migration capability around the hardware
>>>>>>> and not around the software features and don't allow them to be enabled currently
>>>>>>> together.
>>>>>>>
>>>>>>> A software feature is a nice add on that is helpful for evaluation or development
>>>>>>> purpose. There is few space for it on productions systems.
>>>>>>>
>>>>>>> One option that I currently see to make software implemented facility migration
>>>>>>> capable is to calculate some kind of hash value derived from the full set of
>>>>>>> active software facilities. That value can be compared with pre-calculated
>>>>>>> values also stored in the supported model table of qemu. This value could be
>>>>>>> seen like a virtual model extension that has to match like the model name.
>>>>>>>
>>>>>>> But I have said it elsewhere already, a soft facility should be an exception and
>>>>>>> not the rule.
>>>>>>>
>>>>>>>>>> So all we need is a list of "features the guest sees available" which is
>>>>>>>>>> the same as "features user space wants the guest to see" which then gets
>>>>>>>>>> masked through "features the host can do in hardware".
>>>>>>>>>>
>>>>>>>>>> For emulation we can just check on the global feature availability on
>>>>>>>>>> whether we should emulate them or not.
>>>>>>>>>>
>>>>>>>>>>>> Also, if user space wants to make sure that its feature list is actually
>>>>>>>>>>>> workable on the host kernel, it needs to set and get the features again
>>>>>>>>>>>> and then compare that with the ones it set? That's different from x86's
>>>>>>>>>>>> cpuid implementation but probably workable.
>>>>>>>>>>> User space will probe what facilities are available and match them with the predefined
>>>>>>>>>>> cpu model set. Only those models which use a partial or full subset of the hard/host
>>>>>>>>>>> facility list are selectable.
>>>>>>>>>> Why?
>>>>>>>>> If a host does not offer the features required for a model it is not able to
>>>>>>>>> run efficiently.
>>>>>>>>>
>>>>>>>>>> Please take a look at how x86 does cpuid masking :).
>>>>>>>>>>
>>>>>>>>>> In fact, I'm not 100% convinced that it's a good idea to link cpuid /
>>>>>>>>>> feature list exposure to the guest and actual feature implementation
>>>>>>>>>> inside the guest together. On POWER there is a patch set pending that
>>>>>>>>>> implements these two things separately - admittedly mostly because
>>>>>>>>>> hardware sucks and we can't change the PVR.
>>>>>>>>> That is maybe the big difference with s390. The cpuid in the S390 case is not
>>>>>>>>> directly comparable with the processor version register of POWER.
>>>>>>>>>
>>>>>>>>> In the S390 world we have a well defined CPU model room spanned by the machine
>>>>>>>>> type and its GA count. Thus we can define a bijective mapping between
>>>>>>>>> (type, ga) <-> (cpuid, ibc, facility set). From type and ga we form the model
>>>>>>>>> name which BTW is meaningful also for a human user.
>>>>>>>> Same thing as POWER.
>>>>>>>>
>>>>>>>>> By means of this name, a management interface (libvirt) will draw decisions if
>>>>>>>>> migration to a remote hypervisor is a good idea or not. For that it just needs
>>>>>>>>> to compare if the current model of the guest on the source hypervisor
>>>>>>>>> ("query-cpu-model"), is contained in the supported model list of the target
>>>>>>>>> hypervisor ("query-cpu-definitions").
>>>>>>>> I don't think this works, since QEMU should always return all the cpu
>>>>>>>> definitions it's aware of on query-cpu-definitions, not just the ones
>>>>>>>> that it thinks may be compatible with the host at a random point in time.
>>>>>>> It does not return model names that it thinks they are compatible at some point
>>>>>>> in time. In s390 mode, it returns all definitions (CPU models) that a given host
>>>>>>> system is capable to run. Together with the CPU model run by the guest, some upper
>>>>>>> management interface knows if the hypervisor supports the required CPU model and
>>>>>>> uses a guest definition with the same CPU model on the target hypervisor.
>>>>>>>
>>>>>>> The information for that is taken from the model table which QEMU builds up during
>>>>>>> startup time. This list limits the command line selectable CPU models as well.
>>>>>> This makes s390 derive from the way x86 handles things. NAK.
>>>>> One second, that goes a little fast here :-). x86 returns a list they support which happens
>>>>> to be the full list they define and s390 does logically the same because we know that
>>>>> certain models are not supported due to probing. BTW that happens only if you run Qemu on
>>>>> back level hardware and that is perfectly correct.
>>>> It's not what other architectures do and I'd hate to see s390 deviate
>>>> just because.
>>> Only these four architectures implement the query and they all differ a little...
>>>
>>> target-arm/helper.c:CpuDefinitionInfoList *arch_query_cpu_definitions(Error **errp)
>>> target-i386/cpu.c:CpuDefinitionInfoList *arch_query_cpu_definitions(Error **errp)
>>> target-ppc/translate_init.c:CpuDefinitionInfoList *arch_query_cpu_definitions(Error **errp)
>>> target-s390x/cpu.c:CpuDefinitionInfoList *arch_query_cpu_definitions(Error **errp)
>>>
>>> arm walks through a list of all ARM CPU types
>>> list = object_class_get_list(TYPE_ARM_CPU, false);
>>> and returns the CpuDefinitionInfoList derived from that one to one
>>>
>>> i386 loops over the static builtin_x86_defs[] array to retrieve the model names,
>>> they don't even use the CPU class model as source
>>>
>>> ppc walks through a list of all POWER CPU types
>>> list = object_class_get_list(TYPE_POWERPC_CPU, false);
>>> and then extends the produced list by all defined aliases
>>>
>>> and s390x finally also walks through the defined S390 CPU types
>>> list = object_class_get_list(TYPE_S390_CPU, false);
>>> but drops those which are not usable (!is_active)
>>> Just consider them as not defined. I actually would undefine
>>> them if I knew how.
>>>
>>> Also the commands comment says "list of supported virtual CPU definitions" and the s390
>>> list contains all supported models, that's no contradiction.
>> So IMHO we can either
>>
>>     a) change the definition of query_cpu_definitions to only return CPUs
>> that are executable with KVM on a given machine (probably a bad idea) or
> no that is context dependent, yes in KVM case, no in TCG case. Actually for the TCG case one
> would also report only those which can be emulated and not all.

Yup, we're already in a messy position there today with -cpu host which 
we only expose if KVM is available.

>
>>     b) return not only the CPU type, but also a hint whether it's
>> available with KVM or
> That is also changing the semantics
>
>>     c) add a parameter to query_cpu_definitions to say "only return KVM
>> runnable CPUs" or
> That optional parameter is not used by libvirt

Well, we're trying to fix a generic bug here, so maybe it should be used ;).

>
>>     d) introduce a new query_kvm_cpu_definitions qmp command
> The criterion is not KVM here, it is "supported by the current Qemu in conjuction with the given
> host and kernel".

Only for KVM. For TCG it would be "supported by the current QEMU TCG 
engine on a given host". Maybe we'll implement transactional memory 
support in TCG one day but require host transactional memory support for it?

> If I compare the command with it's use on the Qemu Page
> (http://wiki.qemu.org/Features/CPUModels) it says: "Requirement: libvirt needs to know which CPU
> models are available to be used with the "-cpu" option. Current solution: libvirt uses QMP
> query-cpu-definitions command."
>
>>> ##
>>> # @query-cpu-definitions:
>>> #
>>> # Return a list of supported virtual CPU definitions
>>> #
>>> # Returns: a list of CpuDefInfo
>>>
>>>>> The migration compatibility test is pretty much ARCH dependent. I looked into the
>>>>> libvirt implementation and as one can see every architecture has its own implementation
>>>>> there (libvirt/src/cpu/cpu_<arch>.c).
>>>> So here's my question again. How does x86 evaluate whether a target
>>>> machine is compatible with a source machine?
>>> Will again look into that during the afternoon...
>> Yes, please. Someone else must have solved this before :).
> Well, in my eyes it was newer solved for x86! The issue is that libvirt has its one x86 model and
> feature code, independent from Qemu's model and feature implementation. You might know
> libvirt's cpu_map.xml file where in the meantime also POWER started to add a hand-full of model
> names. That's their source, they even use asm instructions to identify the local CPU and to
> derive a model list they (libvirt) think to support. x86 requires alone more the 2500 lines of
> code to implement its from qemu deviating model space. There is no test like the one that I
> suggest that guarantees a priori a target hypervisor is capable to run a specific CPU model until
> it fails.

Ok, maybe it's about time to start moving that code over into QEMU then. 
Maybe not. But this is not something you and me can just decide for s390 
alone - I want this solved for everyone.

>
> x86 basically calculates a cpu model to be used from it's domain xml cpu statements, the cpuid
> that it retrieves by itself and the feature definitions in the cpu_map.xml. During migartion
> the domain xml is just copied without any change to the model. And eventually qemu startup might
> fail.
>
> That cpu_map.xml configuration is obsolete for other architectures like s390 and also for arm.
>
> The usable cpu model name space is retrieved by "virsh cpu-models s390x" which in our case is
> taken from libvirt's qemu capabilities cache which gets filled by "query-cpu-definitions" during
> libvirtd's startup time.
>
> In addition we add a test at some point in libvirt's qemuDomainMigrateBegin() phase where we
> verify the remote hypervisor is suitable to run the current domain's CPU model and to prepare
> the destination domain XML where the CPU model might differ from the local.
>
> Assume the local domain XML specifies host as current model, then the test will retrieve the
> normalized model name by means of "query-cpu-model" and and use it to verify it with the
> remote Qemu cpu model capabilities (sitting in the remote Qemu capabilities cache) and choose
> the remote domain XML CPU model accordingly.

All of that sounds very reasonable. So now we only have to work on 
making that fit into the whole picture, not into a tiny s390 corner case.


Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/