linux-kernel - Re: [RFC][PATCH 2/8] x86/mm: break out kernel address space handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <CD6177E3-BDE5-443F-9A28-351EAE0BE5BA@amacapital.net>
Date:   Fri, 7 Sep 2018 15:21:41 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     Dave Hansen <dave.hansen@...ux.intel.com>
Cc:     linux-kernel@...r.kernel.org, sean.j.christopherson@...el.com,
        peterz@...radead.org, tglx@...utronix.de, x86@...nel.org,
        luto@...nel.org
Subject: Re: [RFC][PATCH 2/8] x86/mm: break out kernel address space handling



> On Sep 7, 2018, at 12:48 PM, Dave Hansen <dave.hansen@...ux.intel.com> wrote:
> 
> 
> From: Dave Hansen <dave.hansen@...ux.intel.com>
> 
> The page fault handler (__do_page_fault())  basically has two sections:
> one for handling faults in the kernel porttion of the address space
> and another for faults in the user porttion of the address space.
> 
> But, these two parts don't stick out that well.  Let's make that more
> clear from code separation and naming.  Pull kernel fault
> handling into its own helper, and reflect that naming by renaming
> spurious_fault() -> spurious_kernel_fault().
> 
> Also, rewrite the vmalloc handling comment a bit.  It was a bit
> stale and also glossed over the reserved bit handling.
> 
> Signed-off-by: Dave Hansen <dave.hansen@...ux.intel.com>
> Cc: Sean Christopherson <sean.j.christopherson@...el.com>
> Cc: "Peter Zijlstra (Intel)" <peterz@...radead.org>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: x86@...nel.org
> Cc: Andy Lutomirski <luto@...nel.org>
> ---
> 
> b/arch/x86/mm/fault.c |   98 ++++++++++++++++++++++++++++++--------------------
> 1 file changed, 59 insertions(+), 39 deletions(-)
> 
> diff -puN arch/x86/mm/fault.c~pkeys-fault-warnings-00 arch/x86/mm/fault.c
> --- a/arch/x86/mm/fault.c~pkeys-fault-warnings-00    2018-09-07 11:21:46.145751902 -0700
> +++ b/arch/x86/mm/fault.c    2018-09-07 11:23:37.643751624 -0700
> @@ -1033,7 +1033,7 @@ mm_fault_error(struct pt_regs *regs, uns
>    }
> }
> 
> -static int spurious_fault_check(unsigned long error_code, pte_t *pte)
> +static int spurious_kernel_fault_check(unsigned long error_code, pte_t *pte)
> {
>    if ((error_code & X86_PF_WRITE) && !pte_write(*pte))
>        return 0;
> @@ -1072,7 +1072,7 @@ static int spurious_fault_check(unsigned
>  * (Optional Invalidation).
>  */
> static noinline int
> -spurious_fault(unsigned long error_code, unsigned long address)
> +spurious_kernel_fault(unsigned long error_code, unsigned long address)
> {
>    pgd_t *pgd;
>    p4d_t *p4d;
> @@ -1103,27 +1103,27 @@ spurious_fault(unsigned long error_code,
>        return 0;
> 
>    if (p4d_large(*p4d))
> -        return spurious_fault_check(error_code, (pte_t *) p4d);
> +        return spurious_kernel_fault_check(error_code, (pte_t *) p4d);
> 
>    pud = pud_offset(p4d, address);
>    if (!pud_present(*pud))
>        return 0;
> 
>    if (pud_large(*pud))
> -        return spurious_fault_check(error_code, (pte_t *) pud);
> +        return spurious_kernel_fault_check(error_code, (pte_t *) pud);
> 
>    pmd = pmd_offset(pud, address);
>    if (!pmd_present(*pmd))
>        return 0;
> 
>    if (pmd_large(*pmd))
> -        return spurious_fault_check(error_code, (pte_t *) pmd);
> +        return spurious_kernel_fault_check(error_code, (pte_t *) pmd);
> 
>    pte = pte_offset_kernel(pmd, address);
>    if (!pte_present(*pte))
>        return 0;
> 
> -    ret = spurious_fault_check(error_code, pte);
> +    ret = spurious_kernel_fault_check(error_code, pte);
>    if (!ret)
>        return 0;
> 
> @@ -1131,12 +1131,12 @@ spurious_fault(unsigned long error_code,
>     * Make sure we have permissions in PMD.
>     * If not, then there's a bug in the page tables:
>     */
> -    ret = spurious_fault_check(error_code, (pte_t *) pmd);
> +    ret = spurious_kernel_fault_check(error_code, (pte_t *) pmd);
>    WARN_ONCE(!ret, "PMD has incorrect permission bits\n");
> 
>    return ret;
> }
> -NOKPROBE_SYMBOL(spurious_fault);
> +NOKPROBE_SYMBOL(spurious_kernel_fault);
> 
> int show_unhandled_signals = 1;
> 
> @@ -1203,6 +1203,55 @@ static inline bool smap_violation(int er
>    return true;
> }
> 
> +static void
> +do_kern_addr_space_fault(struct pt_regs *regs, unsigned long hw_error_code,
> +             unsigned long address)
> +{

Can you add a comment above this documenting *when* it’s called?  Is it all faults, !user_mode faults, or !PF_USER?

> +    /*
> +     * We can fault-in kernel-space virtual memory on-demand. The
> +     * 'reference' page table is init_mm.pgd.
> +     *
> +     * NOTE! We MUST NOT take any locks for this case. We may
> +     * be in an interrupt or a critical region, and should
> +     * only copy the information from the master page table,
> +     * nothing more.
> +     *
> +     * Before doing this on-demand faulting, ensure that the
> +     * fault is not any of the following:
> +     * 1. A fault on a PTE with a reserved bit set.
> +     * 2. A fault caused by a user-mode access.  (Do not demand-
> +     *    fault kernel memory due to user-mode accesses).
> +     * 3. A fault caused by a page-level protection violation.
> +     *    (A demand fault would be on a non-present page which
> +     *     would have X86_PF_PROT==0).
> +     */
> +    if (!(hw_error_code & (X86_PF_RSVD | X86_PF_USER | X86_PF_PROT))) {
> +        if (vmalloc_fault(address) >= 0)
> +            return;
> +    }
> +
> +    /* Was the fault spurious, caused by lazy TLB invalidation? */
> +    if (spurious_kernel_fault(hw_error_code, address))
> +        return;
> +
> +    /* kprobes don't want to hook the spurious faults: */
> +    if (kprobes_fault(regs))
> +        return;
> +
> +    /*
> +     * This is a "bad" fault in the kernel address space.  There
> +     * is no reasonable explanation for it.  We will either kill
> +     * the process for making a bad access, or oops the kernel.
> +     */

Or call an extable handler?

Maybe the wording should be less scary, e.g. “this fault is a genuine error. Send a signal, call an exception handler, or oops, as appropriate.”