[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200616102350.GA29684@lst.de>
Date: Tue, 16 Jun 2020 12:23:50 +0200
From: Christoph Hellwig <hch@....de>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Christoph Hellwig <hch@....de>, Dexuan Cui <decui@...rosoft.com>,
vkuznets <vkuznets@...hat.com>,
Stephen Hemminger <stephen@...workplumber.org>,
Andy Lutomirski <luto@...nel.org>,
Andy Lutomirski <luto@...capital.net>,
Michael Kelley <mikelley@...rosoft.com>,
Ju-Hyoung Lee <juhlee@...rosoft.com>,
"x86@...nel.org" <x86@...nel.org>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
KY Srinivasan <kys@...rosoft.com>,
Tom Lendacky <thomas.lendacky@....com>
Subject: Re: hv_hypercall_pg page permissios
On Tue, Jun 16, 2020 at 12:18:07PM +0200, Peter Zijlstra wrote:
> > It does. But it also means every other user of PAGE_KERNEL_EXEC
> > should trigger this, of which there are a few (kexec, tboot, hibernate,
> > early xen pv mapping, early SEV identity mapping)
>
> There are only 3 users in the entire tree afaict:
>
> arch/arm64/kernel/probes/kprobes.c: page = vmalloc_exec(PAGE_SIZE);
> arch/x86/hyperv/hv_init.c: hv_hypercall_pg = vmalloc_exec(PAGE_SIZE);
> kernel/module.c: return vmalloc_exec(size);
>
> And that last one is a weak function that any arch that has STRICT_RWX
> ought to override.
>
> > We really shouldn't create mappings like this by default. Either we
> > need to flip PAGE_KERNEL_EXEC itself based on the needs of the above
> > users, or add another define to overload vmalloc_exec as there is no
> > other user of that for x86.
>
> We really should get rid of the two !module users of this though; both
> x86 and arm64 have STRICT_RWX and sufficient primitives to DTRT.
>
> What is HV even trying to do with that page? AFAICT it never actually
> writes to it, it seens to give the physica address to an MSR (which I
> suspect then writes crud into the page for us from host context).
>
> Suggesting the page really only needs to be RX.
>
> On top of that, vmalloc_exec() gets us a page from the entire vmalloc
> range, which can be outside of the 2G executable range, which seems to
> suggest vmalloc_exec() is wrong too and all this works by accident.
>
> How about something like this:
>
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index a54c6a401581..82a3a4a9481f 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -375,12 +375,15 @@ void __init hyperv_init(void)
> guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
> wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
>
> - hv_hypercall_pg = vmalloc_exec(PAGE_SIZE);
> + hv_hypercall_pg = module_alloc(PAGE_SIZE);
> if (hv_hypercall_pg == NULL) {
> wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> goto remove_cpuhp_state;
> }
>
> + set_memory_ro((unsigned long)hv_hypercall_pg, 1);
> + set_memory_x((unsigned long)hv_hypercall_pg, 1);
The changing of the permissions sucks. I thought about adding
a module_alloc_prot with an explicit pgprot_t argument. On x86
alone at least ftrace would also benefit from that.
Powered by blists - more mailing lists