[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF1vpki7HqHgXxWsTwMEo4yz592agzZ9c=F09o-1py+jtJpLSw@mail.gmail.com>
Date: Fri, 15 Oct 2021 12:43:33 -0700
From: Avi Deitcher <avi@...tcher.net>
To: "Theodore Ts'o" <tytso@....edu>
Cc: linux-ext4@...r.kernel.org
Subject: Re: algorithm for half-md4 used in htree directories
Thanks, Ted, I will try yours and step through it to figure out what is off.
You ask a fair question: other than madness, why would someone want to
recreate the exact algorithm?
I have had a number of cases where I have needed to manipulate disks,
filesystems, partition tables, etc. from within a non-C-source
program. The options have been: fork/exec to some external program;
run a VM where I can mount the fs and manipulate content as needed.
Those both work, but have had issues in various environments.
I made the mistake of saying, "well, all of this is just manipulating
bytes on a disk image or block device; how hard could it be?" :-)
So understanding the algorithm actually becomes important.
I probably could link the library in if I am working on languages that
support it (go, rust), but not all do, and there are reasons that is
more difficult for the target use case.
I was long hoping to finish with go and move onto Rust by now, then
possibly some others, but this is not what I get paid for, so catch as
catch can.
Does that answer? Feel free to say, "madness" too; I won't disagree.
Avi
On Fri, Oct 15, 2021 at 12:10 PM Theodore Ts'o <tytso@....edu> wrote:
>
> On Fri, Oct 15, 2021 at 11:43:07AM -0700, Avi Deitcher wrote:
> > I am absolutely stumped. I tried the seed as four u32 as is on disk
> > (i.e. big-endian); four u32 little-endian; one long little-endian
> > array of bytes (I have no idea why that would make sense, but worth
> > trying); zeroed out so it gets the default. No one gives a consistent
> > solution.
> >
> > As far as I can tell: hash tells you which intermediate block to look
> > in, minor hash tells you which leaf block to look in, and then you
> > scan. So it is pretty easy to see in what range the minor and major
> > hash should be, but no luck.
> >
> > I put up a gist with debugfs and source and output.
> > https://gist.github.com/deitch/53b01a90635449e7674babfe7e7dd002
> >
> > Anyone who feels like a look-see, I would much appreciate it (and if
> > they figure it out, owe a beer if ever in the same city).
>
> I'm really curious *why* you are trying to reverse engineer the
> implementation. What are you trying to do?
>
> In any case, you're mostly right about what hash and minor_hash are
> for. The 32-bit hash value is only thing that we use in the hashed B+
> tree which is used for hash directories. The 32-bit minor hash is
> used to form a 64-bit number that gets used when we need to support
> things like NFSv3 directory cursors, and POSIX telldir/seekdir (which
> is a massive headache for modern file system, since it assumes that a
> 64-bit "offset" is all you get to reliably provide the POSIX
> telldir/seekdir/readdir guarantees even when the directory is getting
> large number of directory entries added and deleted without limit
> between the telldir and the seekdir call).
>
> As far as what you are doing wrong, I'll point out that *this* program
> (attached below) provides the correct result. Running this through a
> debugger and comparing it with your implrementation is left as an
> exercise for the reader --- but why do you want to try to make your
> own implementation, when you could just pull down e2fsprogs, compile
> it, and then *use* the provided ext2_fs library for whatever the heck
> you are trying to do?
>
> - Ted
>
> #include <stdio.h>
> #include <stdlib.h>
>
> #include <et/com_err.h>
> #include <uuid/uuid.h>
> #include <ext2fs/ext2_fs.h>
> #include <ext2fs/ext2fs.h>
>
> int main(int argc, char **argv)
> {
> uuid_t buf;
> unsigned int *p;
> int i;
> ext2_dirhash_t hash, minor_hash;
> errcode_t retval;
>
> uuid_parse("d64563bc-ea93-4aaf-a943-4657711ed153", buf);
> p = (unsigned int *) buf;
> for (i=0; i < 4; i++) {
> printf("buf[%d] = 0x%08x\n", i, p[i]);
> }
>
> retval = ext2fs_dirhash(1, "dir478", strlen("dir478"), p,
> &hash, &minor_hash);
> printf("dirhash results: retval=%u, hash=0x%08x, minor_hash=0x%08x\n",
> i, hash, minor_hash);
>
> exit(0);
> }
>
> % gcc -g -o /tmp/foo /tmp/foo.c -luuid -lext2fs -lcom_err
> % /tmp/foo
> buf[0] = 0xbc6345d6
> buf[1] = 0xaf4a93ea
> buf[2] = 0x574643a9
> buf[3] = 0x53d11e71
> dirhash results: retval=4, hash=0x012225e2, minor_hash=0x3f08755d
--
Avi Deitcher
avi@...tcher.net
Follow me http://twitter.com/avideitcher
Read me http://blog.atomicinc.com
Powered by blists - more mailing lists