lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <hr7kym77uhbtj32eymcdp5mcbpid7euoiiszhx6yhkrbw5riag@lcozqjayilbo>
Date: Mon, 19 Aug 2024 14:57:16 +0300
From: "kirill.shutemov@...ux.intel.com" <kirill.shutemov@...ux.intel.com>
To: "Huang, Kai" <kai.huang@...el.com>
Cc: "luto@...nel.org" <luto@...nel.org>, 
	"rafael@...nel.org" <rafael@...nel.org>, "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, 
	"bp@...en8.de" <bp@...en8.de>, "peterz@...radead.org" <peterz@...radead.org>, 
	"hpa@...or.com" <hpa@...or.com>, "mingo@...hat.com" <mingo@...hat.com>, 
	"tglx@...utronix.de" <tglx@...utronix.de>, "bhe@...hat.com" <bhe@...hat.com>, 
	"x86@...nel.org" <x86@...nel.org>, "thomas.lendacky@....com" <thomas.lendacky@....com>, 
	"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, 
	"ardb@...nel.org" <ardb@...nel.org>, "seanjc@...gle.com" <seanjc@...gle.com>, 
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>, "tzimmermann@...e.de" <tzimmermann@...e.de>
Subject: Re: [PATCHv3 3/4] x86/64/kexec: Map original relocate_kernel() in
 init_transition_pgtable()

On Mon, Aug 19, 2024 at 11:16:52AM +0000, Huang, Kai wrote:
> On Mon, 2024-08-19 at 10:08 +0300, Kirill A. Shutemov wrote:
> > The init_transition_pgtable() function sets up transitional page tables.
> > It ensures that the relocate_kernel() function is present in the
> > identity mapping at the same location as in the kernel page tables.
> > relocate_kernel() switches to the identity mapping, and the function
> > must be present at the same location in the virtual address space before
> > and after switching page tables.
> > 
> > init_transition_pgtable() maps a copy of relocate_kernel() in
> > image->control_code_page at the relocate_kernel() virtual address, but
> > the original physical address of relocate_kernel() would also work.
> > 
> > It is safe to use original relocate_kernel() physical address cannot be
> > overwritten until swap_pages() is called, and the relocate_kernel()
> > virtual address will not be used by then.
> > 
> > Map the original relocate_kernel() at the relocate_kernel() virtual
> > address in the identity mapping. It is preparation to replace the
> > init_transition_pgtable() implementation with a call to
> > kernel_ident_mapping_init().
> > 
> > Note that while relocate_kernel() switches to the identity mapping, it
> > does not flush global TLB entries (CR4.PGE is not cleared). This means
> > that in most cases, the kernel still runs relocate_kernel() from the
> > original physical address before the change.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
> > ---
> >  arch/x86/kernel/machine_kexec_64.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> > index 9c9ac606893e..645690e81c2d 100644
> > --- a/arch/x86/kernel/machine_kexec_64.c
> > +++ b/arch/x86/kernel/machine_kexec_64.c
> > @@ -157,7 +157,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
> >  	pte_t *pte;
> >  
> >  	vaddr = (unsigned long)relocate_kernel;
> > -	paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE);
> > +	paddr = __pa(relocate_kernel);
> >  	pgd += pgd_index(vaddr);
> >  	if (!pgd_present(*pgd)) {
> >  		p4d = (p4d_t *)get_zeroed_page(GFP_KERNEL);
> 
> 
> IIUC, this breaks KEXEC_JUMP (image->preserve_context is true).
> 
> The relocate_kernel() first saves couple of regs and some other data like PA
> of swap page to the control page.  Note here the VA_CONTROL_PAGE is used to
> access the control page, so those data are saved to the control page.
> 
> SYM_CODE_START_NOALIGN(relocate_kernel)
>         UNWIND_HINT_END_OF_STACK
>         ANNOTATE_NOENDBR
>         /*      
>          * %rdi indirection_page
>          * %rsi page_list
>          * %rdx start address
>          * %rcx preserve_context
>          * %r8  bare_metal
>          */
> 
> 	...
> 
>         movq    PTR(VA_CONTROL_PAGE)(%rsi), %r11                             
>         movq    %rsp, RSP(%r11)                                              
>         movq    %cr0, %rax
>         movq    %rax, CR0(%r11)
>         movq    %cr3, %rax
>         movq    %rax, CR3(%r11)
>         movq    %cr4, %rax
>         movq    %rax, CR4(%r11)
> 
> 	...
> 
> 	/*
>          * get physical address of control page now
>          * this is impossible after page table switch
>          */
>         movq    PTR(PA_CONTROL_PAGE)(%rsi), %r8
> 
>         /* get physical address of page table now too */
>         movq    PTR(PA_TABLE_PAGE)(%rsi), %r9
> 
>         /* get physical address of swap page now */
>         movq    PTR(PA_SWAP_PAGE)(%rsi), %r10
> 
>         /* save some information for jumping back */
>         movq    %r9, CP_PA_TABLE_PAGE(%r11)
>         movq    %r10, CP_PA_SWAP_PAGE(%r11)
>         movq    %rdi, CP_PA_BACKUP_PAGES_MAP(%r11)
> 
> 	...
> 
> And after jumping back from the second kernel, relocate_kernel() tries to
> restore the saved data:
> 
> 	...
> 
>         /* get the re-entry point of the peer system */
>         movq    0(%rsp), %rbp
>         leaq    relocate_kernel(%rip), %r8		<---------  (*) 
>         movq    CP_PA_SWAP_PAGE(%r8), %r10
>         movq    CP_PA_BACKUP_PAGES_MAP(%r8), %rdi
>         movq    CP_PA_TABLE_PAGE(%r8), %rax
>         movq    %rax, %cr3
>         lea     PAGE_SIZE(%r8), %rsp
>         call    swap_pages
>         movq    $virtual_mapped, %rax
>         pushq   %rax
>         ANNOTATE_UNRET_SAFE
>         ret
>         int3
> SYM_CODE_END(identity_mapped)
> 
> Note the above code (*) uses the VA of relocate_kernel() to access the control
> page.  IIUC, that means if we map VA of relocate_kernel() to the original PA
> where the code relocate_kernel() resides, then the above code will never be
> able to read those data back since they were saved to the control page.
> 
> Did I miss anything?

Note that relocate_kernel() usage at (*) is inside identity_mapped(). We
run from identity mapping there. Nothing changed to identity mapping
around relocate_kernel(), only top mapping (at __START_KERNEL_map) is
affected.

But I didn't test kexec jump thing. Do you (or anybody else) have setup to
test it?

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ