lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.0810240853310.3287@nehalem.linux-foundation.org>
Date:	Fri, 24 Oct 2008 09:08:26 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Markus Trippelsdorf <markus@...ppelsdorf.de>
cc:	Theodore Tso <tytso@....edu>, linux-kernel@...r.kernel.org,
	eugene@...ix.com, msnitzer@...ix.com, akpm@...ux-foundation.org
Subject: Re: ext3: fix ext3_dx_readdir hash collision handling - Regression



On Fri, 24 Oct 2008, Markus Trippelsdorf wrote:
>
> Notice that I don't use "rm -rf" but "rm -r". The problem only occurs
> when I run "rm -r" without the "-f" switch.
> 
> The first time I encountered this problem was on an ext4 test partition
> a few days ago. When I tried to delete a big directory structure on that
> partition with mc (GNU Midnight Commander) by hitting F8 and then selecting
> "All" on the "Delete it recursively" popup, it just stopped deleting
> files after a while. So I had to repeat this procedure several times to
> get rid of the directory.

Hmm. I may actually have hit the same problem. I attributed it to a git 
buglet, and left it for later, because the last few days were pretty crazy 
(closing the merge window is when people always send their last-minute 
stuff even if they shouldn't, but to make things worse I was also gone for 
a day-and-a-half).

The "git buglet" looks liek this:

 - build a kernel

 - do "git clean -dqfx". This fails with

	warning: failed to remove 'include/config/'

 - do "git clean -dqfx" again. And now it works - apparently because the 
   first invocation deleted enough of the big directory.

Doing an 'strace' to see what is going on, I see:

	..
	getdents(3, /* 132 entries */, 4096)    = 3888
	lstat("include/config/sgi", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
	open("include/config/sgi", O_RDONLY|O_NONBLOCK|O_DIRECTORY|0x80000) = 4
	fstat(4, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
	getdents(4, /* 3 entries */, 4096)      = 80
	lstat("include/config/sgi/partition.h", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
	unlink("include/config/sgi/partition.h") = 0
	getdents(4, /* 0 entries */, 4096)      = 0
	close(4)                                = 0
	rmdir("include/config/sgi")             = 0
	lstat("include/config/sgi", 0x7fffdc4d2110) = -1 ENOENT (No such file or directory)
	close(3)                                = 0
	write(2, "warning: failed to remove \'include/config/\'\n", 44) = 44
	..

and notice how it does that

	lstat("include/config/sgi", ..

_twice_. That, in turn (knowing the git implementation), implies that it 
was returned twice from that first "getdents(3, ...)" call.

So what git clean does is to scan over the readdir() return values, see if 
it's a file it knows about, try to remove it if not, lstat() every entry, 
recurse into them if they are directories, and then remove it. If the 
lstat() fails, "git clean" will fail.

So what happens is that it sees that "sgi" entry _twice_ in the readdir 
output, traverse into it once (and delete it), and the second time when it 
does an lstat() it will obviously fail (because now it's deleted!), and 
that will make "git clean" fail the whole thing due to an unexpected 
error.

And I _think_ that what brings this on is:

 - caring about the error value

   This is why "rm -rf" works for you, but "rm -r" does not. With "-f", rm 
   simply won't care. And like "rm -r", "git clean" cares about error 
   values.

 - large enough directories that you have *multiple* "getdents()" calls.

 - likely: removing files in between getdents() calls.

Hmm. Ted? I have not tried to revert that commit that Markus pinpointed 
(6a897cf447a83c9c3fd1b85a1e525c02d6eada7d: "ext3: fix ext3_dx_readdir hash 
collision handling"), but now that I look at that "git bug", I am getting 
pretty damn sure that it's exactly the same issue, and it's not a git bug 
at all.

Markus basically has exactly the same thing:

  rm: cannot remove `linux-2.6.27.2/arch/alpha/include/asm/statfs.h': No such file or directory

and that ENOENT is almost certainly because "rm" already removed that 
filename _once_, and it's the second time that fails.

And yes, that commit is all about returning directory entries twice. It 
claims to fix it, but I bet it breaks it.

		Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ