lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m1zmdcty4i.fsf@ebiederm.dsl.xmission.com>
Date:	Wed, 06 Sep 2006 16:43:09 -0600
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Jean Delvare <jdelvare@...e.de>
Cc:	Andrew Morton <akpm@...l.org>, Oleg Nesterov <oleg@...sign.ru>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	linux-kernel@...r.kernel.org, ak@...e.de
Subject: Re: [PATCH] proc: readdir race fix (take 3)

Jean Delvare <jdelvare@...e.de> writes:

> On Wednesday 6 September 2006 11:01, Jean Delvare wrote:
>> Eric, Kame, thanks a lot for working on this. I'll be giving some good
>> testing to this patch today, and will return back to you when I'm done.
>
> The original issue is indeed fixed, but there's a problem with the patch. 
> When stressing /proc (to verify the bug was fixed), my test machine ended 
> up crashing. Here are the 2 traces I found in the logs:

Ugh.  

So the death in __put_task_struct() is from:
WARN_ON(!(tsk->exit_state & (EXIT_DEAD | EXIT_ZOMBIE)));
So it appears we have something that is decrementing but not
incrementing the count on the task struct.

Now what is interesting is that there are a couple of other failure modes
present here.
free_uid called from __put_task_struct is failing


And you seem to have a recursive page fault going on somewhere.

I suspect the triggering of this bug is the result of an earlier oops,
that left some process half cleaned up.

Have you tested 2.6.18-rc6 without my patch?
If not can you please test the same 2.6.18-rc6 configuration with my patch?

> Sometimes the machine just hung, with nothing in the logs. The machine is 
> a Sony laptop (i686).
>
> I have been testing the patch on another machine (x86_64) and had no 
> problem at all, so the reproduceability of the bug might depend on the 
> arch or some config option. I'll help nailing down this issue if I can, 
> just tell me what to do.

So I don't know what is going on with your laptop.  It feels nasty.

I think my patch is just tripping on the problem, rather than causing
it.  The previous version of fs/proc/base.c should have tripped over
this problem as well if it happened to have hit the same process.

I'm staring at the patch and I can not think of anything that would
explain your problem.  The reference counting is simple and the only
bug I had in a posted version was a failure to decrement the count
on the task_struct.

I guess the practical question is what was your test methodology to
reproduce this problem?  A couple of more people running the same
test on a few more machines might at least give us confidence in what
is going on.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ