lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130308182648.GA25175@logfs.org>
Date:	Fri, 8 Mar 2013 13:26:49 -0500
From:	Jörn Engel <joern@...fs.org>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Dave Jones <davej@...hat.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	Al Viro <viro@...iv.linux.org.uk>
Subject: Re: pipe_release oops.

On Fri, 8 March 2013 10:30:01 -0800, Linus Torvalds wrote:
> 
> Hmm. So I've been trying to figure this out, and I really don't see
> it. Every single pipe open routine *should* make sure that the inode
> has an inode->i_pipe field. So if the open() has succeeded and you
> have a valid file descriptor, the inode->i_pipe thing should be there.

Ok, here is a wild idea that is very likely wrong.  But some
background first.  I've had problems with process exit times and one
of the culprits turned out to be exit_files() where one device driver
went awol for several seconds.  Fixing the device driver is hard, I
didn't see a good reason not to call exit_files() earlier and
exit_mm() was the other big offender, so the idea was to run both in
parallel and I applied the patch below.

As a result I've gotten a bunch of NULL pointer dereferences that only
happen in virtual machines, never on real hardware.  For example
  [<ffffffff81164bf8>] alloc_fd+0x38/0x130
  [<ffffffff8114857e>] do_sys_open+0xee/0x1f0
  [<ffffffff811486a1>] sys_open+0x21/0x30
  [<ffffffff815bea29>] system_call_fastpath+0x16/0x1b

Now I can easily see how current->files being NULL will result in such
backtraces.  I can also see how my patch moves the NULLing of
current->files a bit back in time.  But I could never figure out how
my patch could have introduced a race that didn't exist before.

So the wild idea is that we have always had a very unlikely race with
current->files being NULL and trinity happens to hit it somehow.

Jörn

--
One of my most productive days was throwing away 1000 lines of code.
-- Ken Thompson.


diff --git a/kernel/exit.c b/kernel/exit.c
index f65345f9..5886799 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -4,6 +4,7 @@
  *  Copyright (C) 1991, 1992  Linus Torvalds
  */
 
+#include <linux/async.h>
 #include <linux/mm.h>
 #include <linux/slab.h>
 #include <linux/interrupt.h>
@@ -559,6 +560,11 @@ void exit_files(struct task_struct *tsk)
 	}
 }
 
+static void exit_files_async(void *data, async_cookie_t cookie)
+{
+	exit_files(data);
+}
+
 #ifdef CONFIG_MM_OWNER
 /*
  * A task is exiting.   If it owned this mm, find a new owner for the mm.
@@ -905,6 +911,7 @@ static inline void check_stack_usage(void) {}
 void do_exit(long code)
 {
 	struct task_struct *tsk = current;
+	async_cookie_t files_cookie;
 	int group_dead;
 
 	profile_task_exit(tsk);
@@ -982,6 +989,7 @@ void do_exit(long code)
 	tsk->exit_code = code;
 	taskstats_exit(tsk, group_dead);
 
+	files_cookie = async_schedule(exit_files_async, tsk);
 	exit_mm(tsk);
 
 	if (group_dead)
@@ -990,7 +998,7 @@ void do_exit(long code)
 
 	exit_sem(tsk);
 	exit_shm(tsk);
-	exit_files(tsk);
+	async_synchronize_cookie(files_cookie);
 	exit_fs(tsk);
 	exit_task_work(tsk);
 	check_stack_usage();
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ