[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4c140a8e0154504e9c645b9f78b0b164dc25a461.camel@intel.com>
Date: Mon, 19 Aug 2024 11:16:52 +0000
From: "Huang, Kai" <kai.huang@...el.com>
To: "luto@...nel.org" <luto@...nel.org>, "rafael@...nel.org"
<rafael@...nel.org>, "dave.hansen@...ux.intel.com"
<dave.hansen@...ux.intel.com>, "bp@...en8.de" <bp@...en8.de>,
"peterz@...radead.org" <peterz@...radead.org>, "hpa@...or.com"
<hpa@...or.com>, "mingo@...hat.com" <mingo@...hat.com>,
"kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>,
"tglx@...utronix.de" <tglx@...utronix.de>, "bhe@...hat.com" <bhe@...hat.com>,
"x86@...nel.org" <x86@...nel.org>
CC: "thomas.lendacky@....com" <thomas.lendacky@....com>,
"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"ardb@...nel.org" <ardb@...nel.org>, "seanjc@...gle.com" <seanjc@...gle.com>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"tzimmermann@...e.de" <tzimmermann@...e.de>
Subject: Re: [PATCHv3 3/4] x86/64/kexec: Map original relocate_kernel() in
init_transition_pgtable()
On Mon, 2024-08-19 at 10:08 +0300, Kirill A. Shutemov wrote:
> The init_transition_pgtable() function sets up transitional page tables.
> It ensures that the relocate_kernel() function is present in the
> identity mapping at the same location as in the kernel page tables.
> relocate_kernel() switches to the identity mapping, and the function
> must be present at the same location in the virtual address space before
> and after switching page tables.
>
> init_transition_pgtable() maps a copy of relocate_kernel() in
> image->control_code_page at the relocate_kernel() virtual address, but
> the original physical address of relocate_kernel() would also work.
>
> It is safe to use original relocate_kernel() physical address cannot be
> overwritten until swap_pages() is called, and the relocate_kernel()
> virtual address will not be used by then.
>
> Map the original relocate_kernel() at the relocate_kernel() virtual
> address in the identity mapping. It is preparation to replace the
> init_transition_pgtable() implementation with a call to
> kernel_ident_mapping_init().
>
> Note that while relocate_kernel() switches to the identity mapping, it
> does not flush global TLB entries (CR4.PGE is not cleared). This means
> that in most cases, the kernel still runs relocate_kernel() from the
> original physical address before the change.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
> ---
> arch/x86/kernel/machine_kexec_64.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index 9c9ac606893e..645690e81c2d 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -157,7 +157,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
> pte_t *pte;
>
> vaddr = (unsigned long)relocate_kernel;
> - paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE);
> + paddr = __pa(relocate_kernel);
> pgd += pgd_index(vaddr);
> if (!pgd_present(*pgd)) {
> p4d = (p4d_t *)get_zeroed_page(GFP_KERNEL);
IIUC, this breaks KEXEC_JUMP (image->preserve_context is true).
The relocate_kernel() first saves couple of regs and some other data like PA
of swap page to the control page. Note here the VA_CONTROL_PAGE is used to
access the control page, so those data are saved to the control page.
SYM_CODE_START_NOALIGN(relocate_kernel)
UNWIND_HINT_END_OF_STACK
ANNOTATE_NOENDBR
/*
* %rdi indirection_page
* %rsi page_list
* %rdx start address
* %rcx preserve_context
* %r8 bare_metal
*/
...
movq PTR(VA_CONTROL_PAGE)(%rsi), %r11
movq %rsp, RSP(%r11)
movq %cr0, %rax
movq %rax, CR0(%r11)
movq %cr3, %rax
movq %rax, CR3(%r11)
movq %cr4, %rax
movq %rax, CR4(%r11)
...
/*
* get physical address of control page now
* this is impossible after page table switch
*/
movq PTR(PA_CONTROL_PAGE)(%rsi), %r8
/* get physical address of page table now too */
movq PTR(PA_TABLE_PAGE)(%rsi), %r9
/* get physical address of swap page now */
movq PTR(PA_SWAP_PAGE)(%rsi), %r10
/* save some information for jumping back */
movq %r9, CP_PA_TABLE_PAGE(%r11)
movq %r10, CP_PA_SWAP_PAGE(%r11)
movq %rdi, CP_PA_BACKUP_PAGES_MAP(%r11)
...
And after jumping back from the second kernel, relocate_kernel() tries to
restore the saved data:
...
/* get the re-entry point of the peer system */
movq 0(%rsp), %rbp
leaq relocate_kernel(%rip), %r8 <--------- (*)
movq CP_PA_SWAP_PAGE(%r8), %r10
movq CP_PA_BACKUP_PAGES_MAP(%r8), %rdi
movq CP_PA_TABLE_PAGE(%r8), %rax
movq %rax, %cr3
lea PAGE_SIZE(%r8), %rsp
call swap_pages
movq $virtual_mapped, %rax
pushq %rax
ANNOTATE_UNRET_SAFE
ret
int3
SYM_CODE_END(identity_mapped)
Note the above code (*) uses the VA of relocate_kernel() to access the control
page. IIUC, that means if we map VA of relocate_kernel() to the original PA
where the code relocate_kernel() resides, then the above code will never be
able to read those data back since they were saved to the control page.
Did I miss anything?
Powered by blists - more mailing lists