linux-kernel - Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56B25F0C.2050808@oracle.com>
Date:	Wed, 3 Feb 2016 15:11:56 -0500
From:	Boris Ostrovsky <boris.ostrovsky@...cle.com>
To:	"Luis R. Rodriguez" <mcgrof@...nel.org>
Cc:	David Vrabel <david.vrabel@...rix.com>, konrad.wilk@...cle.com,
	xen-devel@...ts.xenproject.org, mcgrof@...e.com,
	linux-kernel@...r.kernel.org, roger.pau@...rix.com, x86@...nel.org,
	GLin@...e.coma, bblanco@...mgrid.com, pmonclus@...mgrid.com,
	bp@...e.de, hpa@...or.com
Subject: Re: [PATCH v2 02/11] xen/hvmlite: Bootstrap HVMlite guest

On 02/03/2016 01:55 PM, Luis R. Rodriguez wrote:
> I saw no considerations for the recommendations I had made last on your v1:
>
> https://lkml.kernel.org/r/CAB=NE6XPA0YzbnM8=rspkKai6d3GkXXO00Gr0VZUYoyzNy6thw@mail.gmail.com
>
> Of importance:
>
> 1) Using pv_info.paravirt_enabled = 1 is wrong unless you mean to say this
>     is for legacy x86:
>
> Your patch #3 keeps on setting pv_info.paravirt_enabled = 1 and as discussed
> this is wrong. It will be renamed to x86_legacy_free() to align with what folks
> are pushing for a BIOS flag to annotate if a system requires legacy x86 stuff.
> This also means re-thinking all use cases and ensuring subarch is used then
> instead when the goal was to avoid Xen from entering that code. Today Xen does
> not use this but with my work it does and it helps clean and brush up a lot of
> these checks with future prospects to even help unify entry points.

As I said earlier, I am not sure I understand what subarch buys us for 
HVMlite guests.

As for using paravirt_enabled -- this is really only used to 
differentiate HVM from HVMlite and I think (although I'd need to check) 
is only needed by Xen-specific code in a couple of places. So if/when it 
is removed we will switch to something else. Since your work is WIP I 
decided to keep using it until it's clear what other options may be 
available.

>
> 2) We should avoid more hypervisor type hacks, and just consider a new
>     hypervisor type to close the gap:
>
> Using x86_legacy_free() and friends in a unified way for all systems means it
> should only be used after init_hypervisor_platform() which is called during
> setup_arch().  This means we have a semantic gap for checks on "are we on
> hypervisor type and which one?".

In this particular case we don't need any information about hypervisor 
until init_hypervisor_platform().

> There are drivers now using these sorts of
> checks as well, for instance snd_intel8x0_inside_vm(). We should avoid having
> these hacks but that also means cleaning up a well define grammar here for what
> we want.  I'm doing work to help with this by streamlining use of the subarch
> type, that should help with PV code, but your use case seems different but yet
> related, what I had suggested last was to consider we add a new hypervisor type
> to the x86 boot protocol which would be available early on. This would have a
> few purposes, one of which deserves its own section below on dead code:
>
>      a) clean up hacks as with snd_intel8x0_inside_vm()
>      b) enable a generic way and clean way to distinguish what hypervisor
>         type you're on
>      c) since it would be set early and if we can ensure its accessible
>         early on boot it would mean avoiding having to add yet-another
>         asm entry point for Linux, you could just use startup_32() and
>         the hypervisor type could easily just have an early branch call
>         and post branch call very similar to how we deal with the subarch
>         currently on 32-bit. Your calls then just become early stubs and
>         we'd have a solution for other PV types that want a similar solution
>         later

Which calls?

If you are referring to xen_prepare_hvmlite/hvmlite_bootparams then 
these are needed to prepare boot_params. And we should not enter 
startup_32() without them ready.


>
> 3) Dead code concerns and unifying entry points:
>
> Addressing the semantics for the gray areas I am highlighting are critical for
> ensuring one does not run code or even exposes code as a available for the type
> of run time system booted, some folks call this "dead code". This is critical
> for Linux distributions which need to rely on the flexibility of having one
> kernel work for different use cases. The resolution to this problem was pvops
> but pvops has shortcoming for dead code, it didn't address the problem likely as
> it was not considered serious. It also didn't address the issue of different
> hypervisors wanting different entry points and that this fact alone also contributes
> to more dead code concerns, case in point the regressions introduced by cr4 shadow
> and the latest one is Kasan which to this day breaks Xen! Dead code topics are
> not easy to grasp, its why I've started on my own crusade to talk to people and
> write about it [0], and as of late propose some changes to avoid these in a
> clean way without extending pvops. Adding yet another entry point will not help
> here *specially* if we do not take semantics seriously over the different hypervisors
> and hypervisor types.
>
> [0] http://www.do-not-panic.com/2015/12/avoiding-dead-code-pvops-not-silver-bullet.html
> [1] http://www.do-not-panic.com/2015/12/xen-and-x86-linux-zero-page.html

I don't understand what this has to do with HVMlite guests.

>
> My recommendation then:
>
> We add new hypervisor type to close the semantic gap for hypervisor types, and
> much like subarch enable also a subarch_data to let you pass and use your
> hvmlite_start_info. This would not only help with the semantics but also help
> avoid yet-another-entry point and force us to provide a well define structure
> for considering code that should not run by pegging it as required or supported
> for different early x86 code stubs.

As I said before, I don't see how we can avoid having another entry 
point without making Xen change its load procedure. Which is highly 
unlikely to happen.

>
> I had hinted perhaps we might be able to piggy back on top of the ELF loader
> protocol as well, and since that's standard do wonder if that could instead
> be extended to help unify a mechanism for different OSes instead of making
> this just a solution for Linux.
>
> Code review below.
>
> On Mon, Feb 01, 2016 at 10:38:48AM -0500, Boris Ostrovsky wrote:
>> Start HVMlite guest at XEN_ELFNOTE_PHYS32_ENTRY address. Setup hypercall
>> page, initialize boot_params, enable early page tables.
>>
>> Since this stub is executed before kernel entry point we cannot use
>> variables in .bss which is cleared by kernel. We explicitly place
>> variables that are initialized here into .data.
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@...cle.com>
>> ---
>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>> index 5774800..5f05fa2 100644
>> --- a/arch/x86/xen/enlighten.c
>> +++ b/arch/x86/xen/enlighten.c
>> @@ -171,6 +172,17 @@ struct tls_descs {
>>    */
>>   static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc);
>>   
>> +#ifdef CONFIG_XEN_PVHVM
>> +/*
>> + * HVMlite variables. These need to live in data segment since they are
>> + * initialized before startup_{32|64}, which clear .bss, are invoked.
>> + */
>> +int xen_hvmlite __attribute__((section(".data"))) = 0;
>> +struct hvm_start_info hvmlite_start_info __attribute__((section(".data")));
>> +uint hvmlite_start_info_sz = sizeof(hvmlite_start_info);
>> +struct boot_params xen_hvmlite_boot_params __attribute__((section(".data")));
>> +#endif
>> +
> The section annotations seems very special use case but likely worth documenting
> and defining a new macro for in include/linux/compiler.h. This would make it
> easier to change should we want to change the section used here later and
> enable others to easily look for the reason for these annotations in a
> single place.

I wonder whether __initdata would be a good attribute. We only need this 
early in the boot.

And xen_hvmlite is gone now so we don't need to worry about it.

-boris