lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5df1b66b-622e-657e-f024-ce8d1a7e2476@linux.alibaba.com>
Date:   Mon, 3 Jul 2023 17:07:22 +0800
From:   Jingbo Xu <jefflexu@...ux.alibaba.com>
To:     Alexander Larsson <alexl@...hat.com>
Cc:     hsiangkao@...ux.alibaba.com, chao@...nel.org, huyue2@...lpad.com,
        linux-erofs@...ts.ozlabs.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC 0/2] erofs: introduce bloom filter for xattr



On 7/3/23 3:25 PM, Alexander Larsson wrote:
> On Wed, Jun 28, 2023 at 5:38 AM Jingbo Xu <jefflexu@...ux.alibaba.com> wrote:
>>
>> Hi all,
>>
>> Sorry for the late reply as I was on vacation these days.
>>
>> I test the hash bit for all xattrs given by Alex[1], to see if each
>> xattr could be mapped into one unique bit in the 32-bit bloom filter.
>>
>> [1]
>> https://lore.kernel.org/all/CAL7ro1HhYUDrOX7A-13p7rLBZSWHTQWGOdOzVcYkddkU_LArUw@mail.gmail.com/
>>
>>
>> On 6/21/23 4:32 PM, Jingbo Xu wrote:
>>>
>>> 3.2. input of hash function
>>> -------------------------
>>> As previously described, each hash function will map the given data into
>>> one bit of the bloom filter map.  In our use case, xattr name serves as
>>> the key of hash function.
>>>
>>> When .getxattr() gets called, only index (e.g. EROFS_XATTR_INDEX_USER)
>>> and the remaining name apart from the prefix are handy.  To avoid
>>> constructing the full xattr name, the above index and name are fed into
>>> the hash function directly in the following way:
>>>
>>> ```
>>> bit = xxh32(name, strlen(name), index + i);
>>> ```
>>>
>>> where index serves as part of seed, so that it gets involved in the
>>> calculation for the hash.
>>
>>
>> All xattrs are hashed with one single hash function.
>>
>> I first tested with the following hash function:
>>
>> ```
>> xxh32(name, strlen(name), index)
>> ```
>>
>> where `index` represents the index of corresponding predefined name
>> prefix (e.g. EROFS_XATTR_INDEX_USER), while `name` represents the name
>> after stripping the above predefined name prefix (e.g.
>> "overlay.metacopy" for "user.overlay.metacopy")
>>
>>
>> The mapping results are:
>>
>> bit  0: security.SMACK64EXEC
>> bit  1:
>> bit  2: user.overlay.protattr
>> bit  3: trusted.overlay.impure, user.overlay.opaque, user.mime_type
>> bit  4:
>> bit  5: user.overlay.origin
>> bit  6: user.overlay.metacopy, security.evm
>> bit  8: trusted.overlay.opaque
>> bit  9: trusted.overlay.origin
>> bit 10: trusted.overlay.upper, trusted.overlay.protattr
>> bit 11: security.apparmor, security.capability
>> bit 12: security.SMACK64
>> bit 13: user.overlay.redirect, security.ima
>> bit 14: user.overlay.upper
>> bit 15: trusted.overlay.redirect
>> bit 16: security.SMACK64IPOUT
>> bit 17:
>> bit 18: system.posix_acl_access
>> bit 19: security.selinux
>> bit 20:
>> bit 21:
>> bit 22: system.posix_acl_default
>> bit 23: security.SMACK64MMAP
>> bit 24: user.overlay.impure, user.overlay.nlink, security.SMACK64TRANSMUTE
>> bit 25: trusted.overlay.metacopy
>> bit 26:
>> bit 27: security.SMACK64IPIN
>> bit 28:
>> bit 29:
>> bit 30: trusted.overlay.nlink
>> bit 31:
>>
>> Here 30 xattrs are mapped into 22 bits.  There are two potential
>> conflicts, i.e. bit 10 (trusted.overlay.upper, trusted.overlay.protattr)
>> and bit 24 (user.overlay.impure, user.overlay.nlink).
> 
> Bit 11 (apparmor and capabilities) seems like the most likely thing to
> run into. I.e. on an apparmor-using system, many files would have
> apparmor xattr set, so looking up security.capabilities on it would
> cause a false negative and we'd unnecessarily read the xattrs.
> 
>>> An alternative way is to calculate the hash from the full xattr name by
>>> feeding the prefix string and the remaining name string separately in
>>> the following way:
>>>
>>> ```
>>> xxh32_reset()
>>> xxh32_update(prefix string, ...)
>>> xxh32_update(remaining name, ...)
>>> xxh32_digest()
>>> ```
>>>
>>> But I doubt if it really deserves to call multiple APIs instead of one
>>> single xxh32().
>>
>>
>> I also tested with the following hash function, where the full name of
>> the xattr, e.g. "user.overlay.metacopy", is fed into the hash function.
>>
>> ```
>> xxh32(name, strlen(name), 0)
>> ```
>>
>>
>> Following are the mapping results:
>>
>> bit  0: trusted.overlay.impure, user.overlay.protattr
>> bit  1: security.SMACK64IPOUT
>> bit  2:
>> bit  3: security.capability
>> bit  4: security.selinux
>> bit  5: security.ima
>> bit  6: user.overlay.metacopy
>> bit  8:
>> bit  9: trusted.overlay.redirect, security.SMACK64EXEC
>> bit 10: system.posix_acl_access
>> bit 11: trusted.overlay.nlink
>> bit 12: trusted.overlay.opaque
>> bit 13:
>> bit 14:
>> bit 15:
>> bit 16:
>> bit 17: user.overlay.impure
>> bit 18: security.apparmor
>> bit 19:
>> bit 20: user.overlay.origin, user.overlay.nlink, security.SMACK64TRANSMUTE
>> bit 21:
>> bit 22: trusted.overlay.metacopy, trusted.overlay.protattr
>> bit 23: user.overlay.upper, security.evm
>> bit 24: user.overlay.redirect, security.SMACK64IPIN,
>> system.posix_acl_default
>> bit 25: security.SMACK64
>> bit 26:
>> bit 27: trusted.overlay.upper, security.SMACK64MMAP
>> bit 28: trusted.overlay.origin, user.mime_type
>> bit 29:
>> bit 30:
>> bit 31: user.overlay.opaque
>>
>> 30 xattrs are mapped into 20 bits.  Similarly there are two potential
>> conflicts, i.e. bit 20 (user.overlay.origin, user.overlay.nlink) and bit
>> 22 (trusted.overlay.metacopy, trusted.overlay.protattr).
>>
>>
>> Summary
>> =======
>>
>> Personally I would prefer the former, as it maps xattrs into the bloom
>> filter more evenly (22 bits vs 20 bits) and can better cooperate with
>> the kernel routine (index and the remaining name string, rather than the
>> full name string, are handy).
> 
> I agree that we want the approach with better cooperation with the
> kernel function. However, I would much prefer if all the xattrs that
> are commonly set on many files are unconflicted. This would be at
> least: selinux, ima, evm, apparmor.
> 
> Can't you just add a magic constant to the seed? Then we can come up
> with one that gives a good spread and hardcode that.

Brilliant idea!  I would try to see if it works.

-- 
Thanks,
Jingbo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ