linux-ext4 - Re: infinite getdents64 loop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4DE5205C.5020209@itwm.fraunhofer.de>
Date:	Tue, 31 May 2011 19:07:40 +0200
From:	Bernd Schubert <bernd.schubert@...m.fraunhofer.de>
To:	"Ted Ts'o" <tytso@....edu>
CC:	linux-nfs@...r.kernel.org, linux-ext4@...r.kernel.org
Subject: Re: infinite getdents64 loop

On 05/31/2011 02:35 PM, Ted Ts'o wrote:
> On Tue, May 31, 2011 at 12:18:11PM +0200, Bernd Schubert wrote:
>>
>> Out of interest, did anyone ever benchmark if dirindex provides any
>> advantages to readdir?  And did those benchmarks include the
>> disadvantages of the present implementation (non-linear inode
>> numbers from readdir, so disk seeks on stat() (e.g. from 'ls -l') or
>> 'rm -fr $dir')?
>
> The problem is that seekdir/telldir is terminally broken (and so is
> NFSv2 for using a such a tiny cookie) in that it fundamentally assumes
> a linear data structure.  If you're going to use any kind of
> tree-based data structure, a 32-bit "offset" for seekdir/telldir just
> doesn't cut it.  We actually play games where we memoize the low
> 32-bits of the hash and keep track of which cookies we hand out via
> seekdir/telldir so that things mostly work --- except for NFSv2, where
> with the 32-bit cookie, you're just hosed.

Well, lets just ignore NFSv2, for NFS there are better working v3 and v4 
alternatives. My real concern are ext3 and ext4, which have

#define pos2min_hash(pos)	(0)


>
> The reason why we have to iterate over the directory in hash tree
> order is because if we have a leaf node split, half the directories
> entries get copied to another directory entry, given the promises made
> by seekdir() and telldir() about directory entries appearing exactly
> once during a readdir() stream, even if you hold the fd open for weeks
> or days, mean that you really have to iterate over things in hash
> order.

Ah, I never looked into the dirindex implementation, I always thought 
the dirindex blocks get updated and not real directory entries as well.

>
> I'd have to look, since it's been too many years, but as I recall the
> problem was that there is a common path for NFSv2 and NFSv3/v4, so we
> don't know whether we can hand back a 32-bit cookie or a 64-bit
> cookie, so we're always handing the NFS server a 32-bit "offset", even
> though ew could do better.  Actually, if we had an interface where we
> could give you a 128-bit "offset" into the directory, we could
> probably eliminate the duplicate cookie problem entirely.  We just
> send 64-bits worth of hash, plus the first two bytes of the of file
> name.

Well, personally I'm more interested in user space, but I don't see any 
difference between NFS, other kernel paths and user space. I think this 
is used for everything:

	/* Some one has messed with f_pos; reset the world */
	if (info->last_pos != filp->f_pos) {
		free_rb_tree_fname(&info->root);
		info->curr_node = NULL;
		info->extra_fname = NULL;
		info->curr_hash = pos2maj_hash(filp->f_pos);
		info->curr_minor_hash = pos2min_hash(filp->f_pos);
	}


So with the above #define pos2min_hash(), info->curr_minor_hash is 
always zero with no exception. Or do I miss something?

>
>> 3) Disable dirindexing for readdirs
>
> That won't work, since it will break POSIX compliance.  Once again,
> we're tied by the decisions made decades ago...

I really wonder if we couldn't set a flag somewhere to ignore posix for 
applications that could handle it on their own. Pity that opendir 
doesn't allow to set flags. An ioctl would be another choice.


Thanks,
Bernd
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html