linux-kernel - Re: [Qemu-devel] [PATCH v1 RFC 6/6] KVM: s390: add cpu model support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140522102316.653f34ba@bee>
Date:	Thu, 22 May 2014 10:23:16 +0200
From:	Michael Mueller <mimu@...ux.vnet.ibm.com>
To:	Alexander Graf <agraf@...e.de>
Cc:	linux-s390@...r.kernel.org, kvm@...r.kernel.org,
	Gleb Natapov <gleb@...nel.org>, qemu-devel@...gnu.org,
	linux-kernel@...r.kernel.org,
	Christian Borntraeger <borntraeger@...ibm.com>,
	"Jason J. Herne" <jjherne@...ux.vnet.ibm.com>,
	Cornelia Huck <cornelia.huck@...ibm.com>,
	Paolo Bonzini <pbonzini@...hat.com>,
	Andreas Faerber <afaerber@...e.de>,
	Richard Henderson <rth@...ddle.net>,
	Eduardo Habkost <ehabkost@...hat.com>
Subject: Re: [Qemu-devel] [PATCH v1 RFC 6/6] KVM: s390: add cpu model
 support

On Wed, 21 May 2014 15:22:35 +0200
Alexander Graf <agraf@...e.de> wrote:

I have seen the slides from Eduardo which he presented during this years
DevConf in Brno and made my comments according the s390x implementation
on that. Is you will see, this is mostly overlapping except for the model
definition authority that I clearly see on qemu's side.

See pdf attachment.

> 
> On 21.05.14 14:56, Michael Mueller wrote:
> > On Tue, 20 May 2014 12:10:23 +0200
> > Alexander Graf <agraf@...e.de> wrote:
> >
> >> On 20.05.14 12:02, Michael Mueller wrote:
> >>> On Mon, 19 May 2014 22:14:00 +0200
> >>> Alexander Graf <agraf@...e.de> wrote:
> >>>
> >>>> On 19.05.14 19:03, Michael Mueller wrote:
> >>>>> On Mon, 19 May 2014 16:49:28 +0200
> >>>>> Alexander Graf <agraf@...e.de> wrote:
> >>>>>
> >> [...]
> >>
> >>>>>>> What user and thus also user space wants depends on other factors:
> >>>>>>>
> >>>>>>> 1. reliability
> >>>>>>> 2. performance
> >>>>>>> 3. availability
> >>>>>>>
> >>>>>>> It's not features, that's what programmers want.
> >>>>>>>
> >>>>>>> That's why I have designed the model and migration capability around the hardware
> >>>>>>> and not around the software features and don't allow them to be enabled currently
> >>>>>>> together.
> >>>>>>>
> >>>>>>> A software feature is a nice add on that is helpful for evaluation or development
> >>>>>>> purpose. There is few space for it on productions systems.
> >>>>>>>
> >>>>>>> One option that I currently see to make software implemented facility migration
> >>>>>>> capable is to calculate some kind of hash value derived from the full set of
> >>>>>>> active software facilities. That value can be compared with pre-calculated
> >>>>>>> values also stored in the supported model table of qemu. This value could be
> >>>>>>> seen like a virtual model extension that has to match like the model name.
> >>>>>>>
> >>>>>>> But I have said it elsewhere already, a soft facility should be an exception and
> >>>>>>> not the rule.
> >>>>>>>
> >>>>>>>>>> So all we need is a list of "features the guest sees available" which is
> >>>>>>>>>> the same as "features user space wants the guest to see" which then gets
> >>>>>>>>>> masked through "features the host can do in hardware".
> >>>>>>>>>>
> >>>>>>>>>> For emulation we can just check on the global feature availability on
> >>>>>>>>>> whether we should emulate them or not.
> >>>>>>>>>>
> >>>>>>>>>>>> Also, if user space wants to make sure that its feature list is actually
> >>>>>>>>>>>> workable on the host kernel, it needs to set and get the features again
> >>>>>>>>>>>> and then compare that with the ones it set? That's different from x86's
> >>>>>>>>>>>> cpuid implementation but probably workable.
> >>>>>>>>>>> User space will probe what facilities are available and match them with the
> >>>>>>>>>>> predefined cpu model set. Only those models which use a partial or full subset of
> >>>>>>>>>>> the hard/host facility list are selectable.
> >>>>>>>>>> Why?
> >>>>>>>>> If a host does not offer the features required for a model it is not able to
> >>>>>>>>> run efficiently.
> >>>>>>>>>
> >>>>>>>>>> Please take a look at how x86 does cpuid masking :).
> >>>>>>>>>>
> >>>>>>>>>> In fact, I'm not 100% convinced that it's a good idea to link cpuid /
> >>>>>>>>>> feature list exposure to the guest and actual feature implementation
> >>>>>>>>>> inside the guest together. On POWER there is a patch set pending that
> >>>>>>>>>> implements these two things separately - admittedly mostly because
> >>>>>>>>>> hardware sucks and we can't change the PVR.
> >>>>>>>>> That is maybe the big difference with s390. The cpuid in the S390 case is not
> >>>>>>>>> directly comparable with the processor version register of POWER.
> >>>>>>>>>
> >>>>>>>>> In the S390 world we have a well defined CPU model room spanned by the machine
> >>>>>>>>> type and its GA count. Thus we can define a bijective mapping between
> >>>>>>>>> (type, ga) <-> (cpuid, ibc, facility set). From type and ga we form the model
> >>>>>>>>> name which BTW is meaningful also for a human user.
> >>>>>>>> Same thing as POWER.
> >>>>>>>>
> >>>>>>>>> By means of this name, a management interface (libvirt) will draw decisions if
> >>>>>>>>> migration to a remote hypervisor is a good idea or not. For that it just needs
> >>>>>>>>> to compare if the current model of the guest on the source hypervisor
> >>>>>>>>> ("query-cpu-model"), is contained in the supported model list of the target
> >>>>>>>>> hypervisor ("query-cpu-definitions").
> >>>>>>>> I don't think this works, since QEMU should always return all the cpu
> >>>>>>>> definitions it's aware of on query-cpu-definitions, not just the ones
> >>>>>>>> that it thinks may be compatible with the host at a random point in time.
> >>>>>>> It does not return model names that it thinks they are compatible at some point
> >>>>>>> in time. In s390 mode, it returns all definitions (CPU models) that a given host
> >>>>>>> system is capable to run. Together with the CPU model run by the guest, some upper
> >>>>>>> management interface knows if the hypervisor supports the required CPU model and
> >>>>>>> uses a guest definition with the same CPU model on the target hypervisor.
> >>>>>>>
> >>>>>>> The information for that is taken from the model table which QEMU builds up during
> >>>>>>> startup time. This list limits the command line selectable CPU models as well.
> >>>>>> This makes s390 derive from the way x86 handles things. NAK.
> >>>>> One second, that goes a little fast here :-). x86 returns a list they support which
> >>>>> happens to be the full list they define and s390 does logically the same because we know
> >>>>> that certain models are not supported due to probing. BTW that happens only if you run
> >>>>> Qemu on back level hardware and that is perfectly correct.
> >>>> It's not what other architectures do and I'd hate to see s390 deviate
> >>>> just because.
> >>> Only these four architectures implement the query and they all differ a little...
> >>>
> >>> target-arm/helper.c:CpuDefinitionInfoList *arch_query_cpu_definitions(Error **errp)
> >>> target-i386/cpu.c:CpuDefinitionInfoList *arch_query_cpu_definitions(Error **errp)
> >>> target-ppc/translate_init.c:CpuDefinitionInfoList *arch_query_cpu_definitions(Error **errp)
> >>> target-s390x/cpu.c:CpuDefinitionInfoList *arch_query_cpu_definitions(Error **errp)
> >>>
> >>> arm walks through a list of all ARM CPU types
> >>> list = object_class_get_list(TYPE_ARM_CPU, false);
> >>> and returns the CpuDefinitionInfoList derived from that one to one
> >>>
> >>> i386 loops over the static builtin_x86_defs[] array to retrieve the model names,
> >>> they don't even use the CPU class model as source
> >>>
> >>> ppc walks through a list of all POWER CPU types
> >>> list = object_class_get_list(TYPE_POWERPC_CPU, false);
> >>> and then extends the produced list by all defined aliases
> >>>
> >>> and s390x finally also walks through the defined S390 CPU types
> >>> list = object_class_get_list(TYPE_S390_CPU, false);
> >>> but drops those which are not usable (!is_active)
> >>> Just consider them as not defined. I actually would undefine
> >>> them if I knew how.
> >>>
> >>> Also the commands comment says "list of supported virtual CPU definitions" and the s390
> >>> list contains all supported models, that's no contradiction.
> >> So IMHO we can either
> >>
> >>     a) change the definition of query_cpu_definitions to only return CPUs
> >> that are executable with KVM on a given machine (probably a bad idea) or
> > no that is context dependent, yes in KVM case, no in TCG case. Actually for the TCG case one
> > would also report only those which can be emulated and not all.
> 
> Yup, we're already in a messy position there today with -cpu host which 
> we only expose if KVM is available.
> 
> >
> >>     b) return not only the CPU type, but also a hint whether it's
> >> available with KVM or
> > That is also changing the semantics
> >
> >>     c) add a parameter to query_cpu_definitions to say "only return KVM
> >> runnable CPUs" or
> > That optional parameter is not used by libvirt
> 
> Well, we're trying to fix a generic bug here, so maybe it should be used ;).

I think Eduardo has a sample implementation here for already, let's see how its done there.
> 
> >
> >>     d) introduce a new query_kvm_cpu_definitions qmp command
> > The criterion is not KVM here, it is "supported by the current Qemu in conjuction with the
> > given host and kernel".
> 
> Only for KVM. For TCG it would be "supported by the current QEMU TCG 
> engine on a given host". Maybe we'll implement transactional memory 
> support in TCG one day but require host transactional memory support for it?
> 
> > If I compare the command with it's use on the Qemu Page
> > (http://wiki.qemu.org/Features/CPUModels) it says: "Requirement: libvirt needs to know which
> > CPU models are available to be used with the "-cpu" option. Current solution: libvirt uses QMP
> > query-cpu-definitions command."
> >
> >>> ##
> >>> # @query-cpu-definitions:
> >>> #
> >>> # Return a list of supported virtual CPU definitions
> >>> #
> >>> # Returns: a list of CpuDefInfo
> >>>
> >>>>> The migration compatibility test is pretty much ARCH dependent. I looked into the
> >>>>> libvirt implementation and as one can see every architecture has its own implementation
> >>>>> there (libvirt/src/cpu/cpu_<arch>.c).
> >>>> So here's my question again. How does x86 evaluate whether a target
> >>>> machine is compatible with a source machine?
> >>> Will again look into that during the afternoon...
> >> Yes, please. Someone else must have solved this before :).
> > Well, in my eyes it was newer solved for x86! The issue is that libvirt has its one x86 model
> > and feature code, independent from Qemu's model and feature implementation. You might know
> > libvirt's cpu_map.xml file where in the meantime also POWER started to add a hand-full of
> > model names. That's their source, they even use asm instructions to identify the local CPU
> > and to derive a model list they (libvirt) think to support. x86 requires alone more the 2500
> > lines of code to implement its from qemu deviating model space. There is no test like the one
> > that I suggest that guarantees a priori a target hypervisor is capable to run a specific CPU
> > model until it fails.
> 
> Ok, maybe it's about time to start moving that code over into QEMU then. 
> Maybe not. But this is not something you and me can just decide for s390 
> alone - I want this solved for everyone.

I will talk with Christian and Connie and see what they suggest how to proceed.

> 
> >
> > x86 basically calculates a cpu model to be used from it's domain xml cpu statements, the cpuid
> > that it retrieves by itself and the feature definitions in the cpu_map.xml. During migartion
> > the domain xml is just copied without any change to the model. And eventually qemu startup
> > might fail.
> >
> > That cpu_map.xml configuration is obsolete for other architectures like s390 and also for arm.
> >
> > The usable cpu model name space is retrieved by "virsh cpu-models s390x" which in our case is
> > taken from libvirt's qemu capabilities cache which gets filled by "query-cpu-definitions"
> > during libvirtd's startup time.
> >
> > In addition we add a test at some point in libvirt's qemuDomainMigrateBegin() phase where we
> > verify the remote hypervisor is suitable to run the current domain's CPU model and to prepare
> > the destination domain XML where the CPU model might differ from the local.
> >
> > Assume the local domain XML specifies host as current model, then the test will retrieve the
> > normalized model name by means of "query-cpu-model" and and use it to verify it with the
> > remote Qemu cpu model capabilities (sitting in the remote Qemu capabilities cache) and choose
> > the remote domain XML CPU model accordingly.
> 
> All of that sounds very reasonable. So now we only have to work on 
> making that fit into the whole picture, not into a tiny s390 corner case.
> 

That is definitely my goal but looking at x86 solely may lead into a special case.

Thanks
Michael 

> 
> Alex
> 


Download attachment "cpu-models-commented.pdf" of type "application/pdf" (190990 bytes)