linux-kernel - Re: [RFC KVM 19/27] kvm/isolation: initialize the KVM page table with core mappings

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <463b86c8-e9a0-fc13-efa4-31df3aea8e54@oracle.com>
Date:   Mon, 13 May 2019 18:47:09 +0200
From:   Alexandre Chartre <alexandre.chartre@...cle.com>
To:     Dave Hansen <dave.hansen@...el.com>, pbonzini@...hat.com,
        rkrcmar@...hat.com, tglx@...utronix.de, mingo@...hat.com,
        bp@...en8.de, hpa@...or.com, dave.hansen@...ux.intel.com,
        luto@...nel.org, peterz@...radead.org, kvm@...r.kernel.org,
        x86@...nel.org, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Cc:     konrad.wilk@...cle.com, jan.setjeeilers@...cle.com,
        liran.alon@...cle.com, jwadams@...gle.com
Subject: Re: [RFC KVM 19/27] kvm/isolation: initialize the KVM page table with
 core mappings



On 5/13/19 5:50 PM, Dave Hansen wrote:
>> +	/*
>> +	 * Copy the mapping for all the kernel text. We copy at the PMD
>> +	 * level since the PUD is shared with the module mapping space.
>> +	 */
>> +	rv = kvm_copy_mapping((void *)__START_KERNEL_map, KERNEL_IMAGE_SIZE,
>> +	     PGT_LEVEL_PMD);
>> +	if (rv)
>> +		goto out_uninit_page_table;
> 
> Could you double-check this?  We (I) have had some repeated confusion
> with the PTI code and kernel text vs. kernel data vs. __init.
> KERNEL_IMAGE_SIZE looks to be 512MB which is quite a bit bigger than
> kernel text.

I probably have the same confusion :-) but I will try to check again.


>> +	/*
>> +	 * Copy the mapping for cpu_entry_area and %esp fixup stacks
>> +	 * (this is based on the PTI userland address space, but probably
>> +	 * not needed because the KVM address space is not directly
>> +	 * enterered from userspace). They can both be copied at the P4D
>> +	 * level since they each have a dedicated P4D entry.
>> +	 */
>> +	rv = kvm_copy_mapping((void *)CPU_ENTRY_AREA_PER_CPU, P4D_SIZE,
>> +	     PGT_LEVEL_P4D);
>> +	if (rv)
>> +		goto out_uninit_page_table;
> 
> cpu_entry_area is used for more than just entry from userspace.  The gdt
> mapping, for instance, is needed everywhere.  You might want to go look
> at 'struct cpu_entry_area' in some more detail.

Ok. Thanks.

>> +#ifdef CONFIG_X86_ESPFIX64
>> +	rv = kvm_copy_mapping((void *)ESPFIX_BASE_ADDR, P4D_SIZE,
>> +	     PGT_LEVEL_P4D);
>> +	if (rv)
>> +		goto out_uninit_page_table;
>> +#endif
> 
> Why are these mappings *needed*?  I thought we only actually used these
> fixup stacks for some crazy iret-to-userspace handling.  We're certainly
> not doing that from KVM context.

Right. I initially looked what was used for PTI, and I probably copied unneeded
mapping.

> Am I forgetting something?
> 
>> +#ifdef CONFIG_VMAP_STACK
>> +	/*
>> +	 * Interrupt stacks are vmap'ed with guard pages, so we need to
>> +	 * copy mappings.
>> +	 */
>> +	for_each_possible_cpu(cpu) {
>> +		stack = per_cpu(hardirq_stack_ptr, cpu);
>> +		pr_debug("IRQ Stack %px\n", stack);
>> +		if (!stack)
>> +			continue;
>> +		rv = kvm_copy_ptes(stack - IRQ_STACK_SIZE, IRQ_STACK_SIZE);
>> +		if (rv)
>> +			goto out_uninit_page_table;
>> +	}
>> +
>> +#endif
> 
> I seem to remember that the KVM VMENTRY/VMEXIT context is very special.
>   Interrupts (and even NMIs?) are disabled.  Would it be feasible to do
> the switching in there so that we never even *get* interrupts in the KVM
> context?

Ideally we would like to run with the KVM address space when handling a VM-exit
(so between a VMEXIT and the next VMENTER) where interrupts are not disabled.

> I also share Peter's concerns about letting modules do this.  If we ever
> go down this road, we're going to have to think very carefully how we
> let KVM do this without giving all the not-so-nice out-of-tree modules
> the keys to the castle.

Right, we probably need some more generic framework for creating limited
kernel context space which kvm (or other module?) can deal with. I think
kvm is a good place to start for having this kind of limited context, hence
this RFC and my request for advice how best to do it.

> A high-level comment: it looks like this is "working", but has probably
> erred on the side of mapping too much.  The hard part is paring this
> back to a truly minimal set of mappings.
> 

Agree.

Thanks,

alex.