[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <CCHLZVOEHBYN.JEOBW598P7K5@oc8246131445.ibm.com>
Date: Thu, 01 Jul 2021 02:02:01 -0500
From: "Christopher M. Riedl" <cmr@...ux.ibm.com>
To: "Nicholas Piggin" <npiggin@...il.com>,
<linuxppc-dev@...ts.ozlabs.org>
Cc: <tglx@...utronix.de>, <x86@...nel.org>, <keescook@...omium.org>,
<linux-hardening@...r.kernel.org>
Subject: Re: [RESEND PATCH v4 08/11] powerpc: Initialize and use a temporary
mm for patching
On Thu Jul 1, 2021 at 1:12 AM CDT, Nicholas Piggin wrote:
> Excerpts from Christopher M. Riedl's message of May 6, 2021 2:34 pm:
> > When code patching a STRICT_KERNEL_RWX kernel the page containing the
> > address to be patched is temporarily mapped as writeable. Currently, a
> > per-cpu vmalloc patch area is used for this purpose. While the patch
> > area is per-cpu, the temporary page mapping is inserted into the kernel
> > page tables for the duration of patching. The mapping is exposed to CPUs
> > other than the patching CPU - this is undesirable from a hardening
> > perspective. Use a temporary mm instead which keeps the mapping local to
> > the CPU doing the patching.
> >
> > Use the `poking_init` init hook to prepare a temporary mm and patching
> > address. Initialize the temporary mm by copying the init mm. Choose a
> > randomized patching address inside the temporary mm userspace address
> > space. The patching address is randomized between PAGE_SIZE and
> > DEFAULT_MAP_WINDOW-PAGE_SIZE. The upper limit is necessary due to how
> > the Book3s64 Hash MMU operates - by default the space above
> > DEFAULT_MAP_WINDOW is not available. For now, the patching address for
> > all platforms/MMUs is randomized inside this range. The number of
> > possible random addresses is dependent on PAGE_SIZE and limited by
> > DEFAULT_MAP_WINDOW.
> >
> > Bits of entropy with 64K page size on BOOK3S_64:
> >
> > bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)
> >
> > PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
> > bits of entropy = log2(128TB / 64K) bits of entropy = 31
> >
> > Randomization occurs only once during initialization at boot.
> >
> > Introduce two new functions, map_patch() and unmap_patch(), to
> > respectively create and remove the temporary mapping with write
> > permissions at patching_addr. The Hash MMU on Book3s64 requires mapping
> > the page for patching with PAGE_SHARED since the kernel cannot access
> > userspace pages with the PAGE_PRIVILEGED (PAGE_KERNEL) bit set.
> >
> > Also introduce hash_prefault_mapping() to preload the SLB entry and HPTE
> > for the patching_addr when using the Hash MMU on Book3s64 to avoid
> > taking an SLB and Hash fault during patching.
>
> What prevents the SLBE or HPTE from being removed before the last
> access?
This code runs with local IRQs disabled - we also don't access anything
else in userspace so I'm not sure what else could cause the entries to
be removed TBH.
>
>
> > +#ifdef CONFIG_PPC_BOOK3S_64
> > +
> > +static inline int hash_prefault_mapping(pgprot_t pgprot)
> > {
> > - struct vm_struct *area;
> > + int err;
> >
> > - area = get_vm_area(PAGE_SIZE, VM_ALLOC);
> > - if (!area) {
> > - WARN_ONCE(1, "Failed to create text area for cpu %d\n",
> > - cpu);
> > - return -1;
> > - }
> > - this_cpu_write(text_poke_area, area);
> > + if (radix_enabled())
> > + return 0;
> >
> > - return 0;
> > -}
> > + err = slb_allocate_user(patching_mm, patching_addr);
> > + if (err)
> > + pr_warn("map patch: failed to allocate slb entry\n");
> >
> > -static int text_area_cpu_down(unsigned int cpu)
> > -{
> > - free_vm_area(this_cpu_read(text_poke_area));
> > - return 0;
> > + err = hash_page_mm(patching_mm, patching_addr, pgprot_val(pgprot), 0,
> > + HPTE_USE_KERNEL_KEY);
> > + if (err)
> > + pr_warn("map patch: failed to insert hashed page\n");
> > +
> > + /* See comment in switch_slb() in mm/book3s64/slb.c */
> > + isync();
>
> I'm not sure if this is enough. Could we context switch here? You've
> got the PTL so no with a normal kernel but maybe yes with an RT kernel
> How about taking an machine check that clears the SLB? Could the HPTE
> get removed by something else here?
All of this happens after a local_irq_save() which should at least
prevent context switches IIUC. I am not sure what else could cause the
HPTE to get removed here.
>
> You want to prevent faults because you might be patching a fault
> handler?
In a more general sense: I don't think we want to take page faults every
time we patch an instruction with a STRICT_RWX kernel. The Hash MMU page
fault handler codepath also checks `current->mm` in some places which
won't match the temporary mm. Also `current->mm` can be NULL which
caused problems in my earlier revisions of this series.
>
> Thanks,
> Nick
Powered by blists - more mailing lists