linux-kernel - Re: [PATCH v7 2/3] powerpc/mm: Only read faulting instruction when necessary in do_page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180523003801.43070ddc@roar.ozlabs.ibm.com>
Date:   Wed, 23 May 2018 00:38:01 +1000
From:   Nicholas Piggin <npiggin@...il.com>
To:     Christophe Leroy <christophe.leroy@....fr>
Cc:     Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Paul Mackerras <paulus@...ba.org>,
        Michael Ellerman <mpe@...erman.id.au>,
        linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org
Subject: Re: [PATCH v7 2/3] powerpc/mm: Only read faulting instruction when
 necessary in do_page_fault()

On Tue, 22 May 2018 16:02:56 +0200 (CEST)
Christophe Leroy <christophe.leroy@....fr> wrote:

> Commit a7a9dcd882a67 ("powerpc: Avoid taking a data miss on every
> userspace instruction miss") has shown that limiting the read of
> faulting instruction to likely cases improves performance.
> 
> This patch goes further into this direction by limiting the read
> of the faulting instruction to the only cases where it is likely
> needed.
> 
> On an MPC885, with the same benchmark app as in the commit referred
> above, we see a reduction of about 3900 dTLB misses (approx 3%):
> 
> Before the patch:
>  Performance counter stats for './fault 500' (10 runs):
> 
>          683033312      cpu-cycles                                                    ( +-  0.03% )
>             134538      dTLB-load-misses                                              ( +-  0.03% )
>              46099      iTLB-load-misses                                              ( +-  0.02% )
>              19681      faults                                                        ( +-  0.02% )
> 
>        5.389747878 seconds time elapsed                                          ( +-  0.06% )
> 
> With the patch:
> 
>  Performance counter stats for './fault 500' (10 runs):
> 
>          682112862      cpu-cycles                                                    ( +-  0.03% )
>             130619      dTLB-load-misses                                              ( +-  0.03% )
>              46073      iTLB-load-misses                                              ( +-  0.05% )
>              19681      faults                                                        ( +-  0.01% )
> 
>        5.381342641 seconds time elapsed                                          ( +-  0.07% )
> 
> The proper work of the huge stack expansion was tested with the
> following app:
> 
> int main(int argc, char **argv)
> {
> 	char buf[1024 * 1025];
> 
> 	sprintf(buf, "Hello world !\n");
> 	printf(buf);
> 
> 	exit(0);
> }
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@....fr>
> ---
>  v7: Following comment from Nicholas on v6 on possibility of the page getting removed from the pagetables
>      between the fault and the read, I have reworked the patch in order to do the get_user() in
>      __do_page_fault() directly in order to reduce complexity compared to version v5

This is looking better, thanks.

> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index fcbb34431da2..dc64b8e06477 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -450,9 +450,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
>  	 * can result in fault, which will cause a deadlock when called with
>  	 * mmap_sem held
>  	 */
> -	if (is_write && is_user)
> -		get_user(inst, (unsigned int __user *)regs->nip);
> -
>  	if (is_user)
>  		flags |= FAULT_FLAG_USER;
>  	if (is_write)
> @@ -498,6 +495,26 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
>  	if (unlikely(!(vma->vm_flags & VM_GROWSDOWN)))
>  		return bad_area(regs, address);
>  
> +	if (unlikely(is_write && is_user && address + 0x100000 < vma->vm_end &&
> +		     !inst)) {
> +		unsigned int __user *nip = (unsigned int __user *)regs->nip;
> +
> +		if (likely(access_ok(VERIFY_READ, nip, sizeof(inst)))) {
> +			int res;
> +
> +			pagefault_disable();
> +			res = __get_user_inatomic(inst, nip);
> +			pagefault_enable();
> +			if (unlikely(res)) {
> +				up_read(&mm->mmap_sem);
> +				res = __get_user(inst, nip);
> +				if (!res && inst)
> +					goto retry;

You're handling error here but the previous code did not?

> +				return bad_area_nosemaphore(regs, address);
> +			}
> +		}
> +	}

Would it be nicer to move all this up into bad_stack_expansion().
It would need a way to handle the retry and insn, but I think it
would still look better.

Thanks,
Nick