linux-ext4 - Re: algorithm for half-md4 used in htree directories

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YWnSMXcR5anaYTEU@mit.edu>
Date:   Fri, 15 Oct 2021 15:10:41 -0400
From:   "Theodore Ts'o" <tytso@....edu>
To:     Avi Deitcher <avi@...tcher.net>
Cc:     linux-ext4@...r.kernel.org
Subject: Re: algorithm for half-md4 used in htree directories

On Fri, Oct 15, 2021 at 11:43:07AM -0700, Avi Deitcher wrote:
> I am absolutely stumped. I tried the seed as four u32 as is on disk
> (i.e. big-endian); four u32 little-endian; one long little-endian
> array of bytes (I have no idea why that would make sense, but worth
> trying); zeroed out so it gets the default. No one gives a consistent
> solution.
> 
> As far as I can tell: hash tells you which intermediate block to look
> in, minor hash tells you which leaf block to look in, and then you
> scan. So it is pretty easy to see in what range the minor and major
> hash should be, but no luck.
> 
> I put up a gist with debugfs and source and output.
> https://gist.github.com/deitch/53b01a90635449e7674babfe7e7dd002
> 
> Anyone who feels like a look-see, I would much appreciate it (and if
> they figure it out, owe a beer if ever in the same city).

I'm really curious *why* you are trying to reverse engineer the
implementation.  What are you trying to do?

In any case, you're mostly right about what hash and minor_hash are
for.  The 32-bit hash value is only thing that we use in the hashed B+
tree which is used for hash directories.  The 32-bit minor hash is
used to form a 64-bit number that gets used when we need to support
things like NFSv3 directory cursors, and POSIX telldir/seekdir (which
is a massive headache for modern file system, since it assumes that a
64-bit "offset" is all you get to reliably provide the POSIX
telldir/seekdir/readdir guarantees even when the directory is getting
large number of directory entries added and deleted without limit
between the telldir and the seekdir call).

As far as what you are doing wrong, I'll point out that *this* program
(attached below) provides the correct result.  Running this through a
debugger and comparing it with your implrementation is left as an
exercise for the reader --- but why do you want to try to make your
own implementation, when you could just pull down e2fsprogs, compile
it, and then *use* the provided ext2_fs library for whatever the heck
you are trying to do?

    	     	      	  	       - Ted

#include <stdio.h>
#include <stdlib.h>

#include <et/com_err.h>
#include <uuid/uuid.h>
#include <ext2fs/ext2_fs.h>
#include <ext2fs/ext2fs.h>

int main(int argc, char **argv)
{
	uuid_t	buf;
	unsigned int *p;
	int i;
	ext2_dirhash_t hash, minor_hash;
	errcode_t retval;

	uuid_parse("d64563bc-ea93-4aaf-a943-4657711ed153", buf);
	p = (unsigned int *) buf;
	for (i=0; i < 4; i++) {
		printf("buf[%d] = 0x%08x\n", i, p[i]);
	}

	retval = ext2fs_dirhash(1, "dir478", strlen("dir478"), p,
				&hash, &minor_hash);
	printf("dirhash results: retval=%u, hash=0x%08x, minor_hash=0x%08x\n",
	       i, hash, minor_hash);

	exit(0);
}

% gcc -g -o /tmp/foo /tmp/foo.c -luuid -lext2fs -lcom_err
% /tmp/foo
buf[0] = 0xbc6345d6
buf[1] = 0xaf4a93ea
buf[2] = 0x574643a9
buf[3] = 0x53d11e71
dirhash results: retval=4, hash=0x012225e2, minor_hash=0x3f08755d