lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 09 Jul 2021 00:03:17 -0500 From: "Christopher M. Riedl" <cmr@...ux.ibm.com> To: "Nicholas Piggin" <npiggin@...il.com>, <linuxppc-dev@...ts.ozlabs.org> Cc: <tglx@...utronix.de>, <x86@...nel.org>, <keescook@...omium.org>, <linux-hardening@...r.kernel.org> Subject: Re: [RESEND PATCH v4 08/11] powerpc: Initialize and use a temporary mm for patching On Thu Jul 1, 2021 at 2:51 AM CDT, Nicholas Piggin wrote: > Excerpts from Christopher M. Riedl's message of July 1, 2021 5:02 pm: > > On Thu Jul 1, 2021 at 1:12 AM CDT, Nicholas Piggin wrote: > >> Excerpts from Christopher M. Riedl's message of May 6, 2021 2:34 pm: > >> > When code patching a STRICT_KERNEL_RWX kernel the page containing the > >> > address to be patched is temporarily mapped as writeable. Currently, a > >> > per-cpu vmalloc patch area is used for this purpose. While the patch > >> > area is per-cpu, the temporary page mapping is inserted into the kernel > >> > page tables for the duration of patching. The mapping is exposed to CPUs > >> > other than the patching CPU - this is undesirable from a hardening > >> > perspective. Use a temporary mm instead which keeps the mapping local to > >> > the CPU doing the patching. > >> > > >> > Use the `poking_init` init hook to prepare a temporary mm and patching > >> > address. Initialize the temporary mm by copying the init mm. Choose a > >> > randomized patching address inside the temporary mm userspace address > >> > space. The patching address is randomized between PAGE_SIZE and > >> > DEFAULT_MAP_WINDOW-PAGE_SIZE. The upper limit is necessary due to how > >> > the Book3s64 Hash MMU operates - by default the space above > >> > DEFAULT_MAP_WINDOW is not available. For now, the patching address for > >> > all platforms/MMUs is randomized inside this range. The number of > >> > possible random addresses is dependent on PAGE_SIZE and limited by > >> > DEFAULT_MAP_WINDOW. > >> > > >> > Bits of entropy with 64K page size on BOOK3S_64: > >> > > >> > bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE) > >> > > >> > PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB > >> > bits of entropy = log2(128TB / 64K) bits of entropy = 31 > >> > > >> > Randomization occurs only once during initialization at boot. > >> > > >> > Introduce two new functions, map_patch() and unmap_patch(), to > >> > respectively create and remove the temporary mapping with write > >> > permissions at patching_addr. The Hash MMU on Book3s64 requires mapping > >> > the page for patching with PAGE_SHARED since the kernel cannot access > >> > userspace pages with the PAGE_PRIVILEGED (PAGE_KERNEL) bit set. > >> > > >> > Also introduce hash_prefault_mapping() to preload the SLB entry and HPTE > >> > for the patching_addr when using the Hash MMU on Book3s64 to avoid > >> > taking an SLB and Hash fault during patching. > >> > >> What prevents the SLBE or HPTE from being removed before the last > >> access? > > > > This code runs with local IRQs disabled - we also don't access anything > > else in userspace so I'm not sure what else could cause the entries to > > be removed TBH. > > > >> > >> > >> > +#ifdef CONFIG_PPC_BOOK3S_64 > >> > + > >> > +static inline int hash_prefault_mapping(pgprot_t pgprot) > >> > { > >> > - struct vm_struct *area; > >> > + int err; > >> > > >> > - area = get_vm_area(PAGE_SIZE, VM_ALLOC); > >> > - if (!area) { > >> > - WARN_ONCE(1, "Failed to create text area for cpu %d\n", > >> > - cpu); > >> > - return -1; > >> > - } > >> > - this_cpu_write(text_poke_area, area); > >> > + if (radix_enabled()) > >> > + return 0; > >> > > >> > - return 0; > >> > -} > >> > + err = slb_allocate_user(patching_mm, patching_addr); > >> > + if (err) > >> > + pr_warn("map patch: failed to allocate slb entry\n"); > >> > > >> > -static int text_area_cpu_down(unsigned int cpu) > >> > -{ > >> > - free_vm_area(this_cpu_read(text_poke_area)); > >> > - return 0; > >> > + err = hash_page_mm(patching_mm, patching_addr, pgprot_val(pgprot), 0, > >> > + HPTE_USE_KERNEL_KEY); > >> > + if (err) > >> > + pr_warn("map patch: failed to insert hashed page\n"); > >> > + > >> > + /* See comment in switch_slb() in mm/book3s64/slb.c */ > >> > + isync(); > >> > >> I'm not sure if this is enough. Could we context switch here? You've > >> got the PTL so no with a normal kernel but maybe yes with an RT kernel > >> How about taking an machine check that clears the SLB? Could the HPTE > >> get removed by something else here? > > > > All of this happens after a local_irq_save() which should at least > > prevent context switches IIUC. > > Ah yeah I didn't look that far back. A machine check can take out SLB > entries. > > > I am not sure what else could cause the > > HPTE to get removed here. > > Other CPUs? > Right because the HPTEs are "global". > >> You want to prevent faults because you might be patching a fault > >> handler? > > > > In a more general sense: I don't think we want to take page faults every > > time we patch an instruction with a STRICT_RWX kernel. The Hash MMU page > > fault handler codepath also checks `current->mm` in some places which > > won't match the temporary mm. Also `current->mm` can be NULL which > > caused problems in my earlier revisions of this series. > > Hmm, that's a bit of a hack then. Maybe doing an actual mm switch and > setting current->mm properly would explode too much. Maybe that's > okayish. > But I can't see how the HPT code is up to the job of this in general > (even if that current->mm issue was fixed). > > To do it without holes you would either have to get the SLB MCE handler > to restore that particular SLB if it flushed it, or restart the patch > code from a fixup location if it took an MCE after installing the SLB. > And bolt a hash table entry. We discussed this a bit off list and decided that it's not worth the trouble implementing percpu temp mm support for Hash at this time. Instead, I will post a new version of this series where we drop into realmode to patch with the Hash MMU. This avoids the W+X mapping altogether and so doesn't expose anything to other CPUs during patching. We will keep the Radix support for a percpu temp mm since 1) it doesn't require hacks like Hash and 2) it's overall preferable to dropping into realmode. > > Thanks, > Nick
Powered by blists - more mailing lists