linux-kernel - [PATCH 4/3] x86/vdso: Optimize setup_additional

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 18 Oct 2013 08:05:01 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Davidlohr Bueso <davidlohr@...com>,
	Al Viro <viro@...iv.linux.org.uk>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Michel Lespinasse <walken@...gle.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Rik van Riel <riel@...hat.com>,
	Tim Chen <tim.c.chen@...ux.intel.com>, aswin@...com,
	linux-mm <linux-mm@...ck.org>, linux-kernel@...r.kernel.org,
	Russell King <linux@....linux.org.uk>,
	Catalin Marinas <catalin.marinas@....com>,
	Will Deacon <will.deacon@....com>,
	Richard Kuo <rkuo@...eaurora.org>,
	Ralf Baechle <ralf@...ux-mips.org>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Paul Mackerras <paulus@...ba.org>,
	Martin Schwidefsky <schwidefsky@...ibm.com>,
	Heiko Carstens <heiko.carstens@...ibm.com>,
	Paul Mundt <lethal@...ux-sh.org>,
	Chris Metcalf <cmetcalf@...era.com>,
	Jeff Dike <jdike@...toit.com>,
	Richard Weinberger <richard@....at>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>
Subject: [PATCH 4/3] x86/vdso: Optimize setup_additional_pages()


* Davidlohr Bueso <davidlohr@...com> wrote:

> --- a/arch/x86/vdso/vma.c
> +++ b/arch/x86/vdso/vma.c
> @@ -154,12 +154,17 @@ static int setup_additional_pages(struct linux_binprm *bprm,
>  				  unsigned size)
>  {
>  	struct mm_struct *mm = current->mm;
> +	struct vm_area_struct *vma;
>  	unsigned long addr;
>  	int ret;
>  
>  	if (!vdso_enabled)
>  		return 0;
>  
> +	vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL);
> +	if (unlikely(!vma))
> +		return -ENOMEM;
> +
>  	down_write(&mm->mmap_sem);
>  	addr = vdso_addr(mm->start_stack, size);
>  	addr = get_unmapped_area(NULL, addr, size, 0, 0);
> @@ -173,14 +178,17 @@ static int setup_additional_pages(struct linux_binprm *bprm,
>  	ret = install_special_mapping(mm, addr, size,
>  				      VM_READ|VM_EXEC|
>  				      VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
> -				      pages);
> +				      pages, &vma);
>  	if (ret) {
>  		current->mm->context.vdso = NULL;
>  		goto up_fail;
>  	}
>  
> +	up_write(&mm->mmap_sem);
> +	return ret;
>  up_fail:
>  	up_write(&mm->mmap_sem);
> +	kmem_cache_free(vm_area_cachep, vma);
>  	return ret;
>  }
>  

1)

Beyond the simplification that Linus suggested, why not introduce a new 
function, named 'install_special_mapping_vma()' or so, and convert 
architectures one by one, without pressure to get it all done (and all 
correct) in a single patch?

2)

I don't see the justification: this code gets executed in exec() where a 
new mm has just been allocated. There's only a single user of the mm and 
thus the critical section width of mmap_sem is more or less irrelevant.

mmap_sem critical section size only matters for codepaths that threaded 
programs can hit.

3)

But, if we do all that, a couple of other (micro-)optimizations are 
possible in setup_additional_pages() as well:

 - vdso_addr(), which is actually much _more_ expensive than kmalloc() 
   because on most distros it will call into the RNG, can also be done 
   outside the mmap_sem.

 - the error paths can all be merged and the common case can be made 
   fall-through.

 - use 'mm' consistently instead of repeating 'current->mm'

 - set 'mm->context.vdso' only once we know it's all a success, and do it 
   outside the lock

 - add a few comments about which operations are locked, which unlocked, 
   and why. Please double check the assumptions I documented there.

See the diff attached below. (Totally untested and all that.)

Also note that I think, in theory, if exec() guaranteed the privacy and 
single threadedness of the new mm, we could probably do _all_ of this 
unlocked. Right now I don't think this is guaranteed: ptrace() users might 
look up the new PID and might interfere on the MM via 
PTRACE_PEEK*/PTRACE_POKE*.

( Furthermore, /proc/ might also give early access to aspects of the mm - 
  although no manipulation of the mm is possible there. )

If such privacy of the new mm was guaranteed then that would also remove 
the need to move the allocation out of install_special_mapping().

But, I don't think it all matters, due to #2 - and your changes actively 
complicate setup_pages(), which makes this security sensitive code a bit 
more fragile. We can still do it out of sheer principle, I just don't see 
where it's supposed to help scale better.

Thanks,

	Ingo

 arch/x86/vdso/vma.c | 40 +++++++++++++++++++++++++++-------------
 1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/arch/x86/vdso/vma.c b/arch/x86/vdso/vma.c
index 431e875..c590387 100644
--- a/arch/x86/vdso/vma.c
+++ b/arch/x86/vdso/vma.c
@@ -157,30 +157,44 @@ static int setup_additional_pages(struct linux_binprm *bprm,
 	unsigned long addr;
 	int ret;
 
-	if (!vdso_enabled)
+	if (unlikely(!vdso_enabled))
 		return 0;
 
-	down_write(&mm->mmap_sem);
+	/*
+	 * Do this outside the MM lock - we are in exec() with a new MM,
+	 * nobody else can use these fields of the mm:
+	 */
 	addr = vdso_addr(mm->start_stack, size);
-	addr = get_unmapped_area(NULL, addr, size, 0, 0);
-	if (IS_ERR_VALUE(addr)) {
-		ret = addr;
-		goto up_fail;
-	}
 
-	current->mm->context.vdso = (void *)addr;
+	/*
+	 * This must be done under the MM lock - there might be parallel
+	 * accesses to this mm, such as ptrace().
+	 *
+	 * [ This could be further optimized if exec() reliably inhibited
+	 *   all early access to the mm. ]
+	 */
+	down_write(&mm->mmap_sem);
+	addr = get_unmapped_area(NULL, addr, size, 0, 0);
+	if (IS_ERR_VALUE(addr))
+		goto up_fail_addr;
 
 	ret = install_special_mapping(mm, addr, size,
 				      VM_READ|VM_EXEC|
 				      VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
 				      pages);
-	if (ret) {
-		current->mm->context.vdso = NULL;
-		goto up_fail;
-	}
+	up_write(&mm->mmap_sem);
+	if (ret)
+		goto fail;
 
-up_fail:
+	mm->context.vdso = (void *)addr;
+	return ret;
+
+up_fail_addr:
+	ret = addr;
 	up_write(&mm->mmap_sem);
+fail:
+	mm->context.vdso = NULL;
+
 	return ret;
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/