[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <3A493D20-568A-4D63-A575-5DEEBFAAF41A@dilger.ca>
Date: Sun, 3 Oct 2021 10:43:17 -0600
From: Andreas Dilger <adilger@...ger.ca>
To: Avi Deitcher <avi@...tcher.net>
Cc: linux-ext4@...r.kernel.org
Subject: Re: algorithm for half-md4 used in htree directories
On Oct 3, 2021, at 06:47, Avi Deitcher <avi@...tcher.net> wrote:
>
> I can narrow down the question further. In my live sample, one of the
> entries in the tree is for a directory named "dir155".
>
> If I run "dx_hash dir155", I get:
>
> # debugfs -R "dx_hash dir155" /var/lib/file.img
> debugfs 1.46.2 (28-Feb-2021)
> Hash of dir155 is 0x16279534 (minor 0x0)
>
> If I look in the tree with "htree_dump", I get:
>
> # debugfs -R "htree_dump /testdir" /var/lib/file.img
> debugfs 1.46.2 (28-Feb-2021)
> ....
> Entry #0: Hash 0x00000000, block 1
> Reading directory block 1, phys 6459
> 168 0x00d11d98-b9b6b16b (16) dir155 332 0x009edafe-77de7d72 (16) dir319
>
> That hash for dir155 does not match what dx_hash gave. If I try to
> take the code from fs/ext4/hash.c and build a small program to
> calculate the hash, I get:
>
> $ ./md4 dir155
> MD4: d90278a1 25182ac7 a02e56be c3f30f04
> hash: 0x25182ac6
> minor: 0xa02e56be
>
> Clearly that isn't what is in the tree. What basic am I missing?
One important factor is that the directory hash has an initial seed
to prevent pathological cases where the user can construct thousands
of directory entries that have a hash collision.
Looking at the code explains this in the comment for __ext4fs_dirhash().
The seed itself comes from sbi->s_hash_seed and is stored in the
per-directory hinfo.seed to be used when counting the filename hash.
In theory there could be a per-directory hash, but it appears to be a
constant for the whole filesystem.
Cheers, Andreas
>
>> On Fri, Oct 1, 2021 at 2:49 PM Avi Deitcher <avi@...tcher.net> wrote:
>>
>> Hi,
>>
>> I have been trying to understand the algorithm used for the "half-md4"
>> in htree-structured directories. Going through the code (and trying
>> not to get into reverse engineering), it looks like it is part of md4
>> but not entirely? Yet any subset I take doesn't quite line up with
>> what I see in an actual sample.
>>
>> What is the algorithm it is using to turn an entry of, e.g., "file125"
>> into the appropriate hash. I did run a live sample, and try to get
>> some form of correlation between the actual md4 hash (16 bytes) of the
>> above to the actual entry (4 bytes) shown by debugfs, without much
>> luck.
>>
>> What basic thing am I missing?
>>
>> Separately, how does the seed play into it?
>>
>> Thanks
>> Avi
>
>
>
> --
> Avi Deitcher
> avi@...tcher.net
> Follow me http://twitter.com/avideitcher
> Read me http://blog.atomicinc.com
Powered by blists - more mailing lists