lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL7ro1FjnO52tawLeu-wNtk+upN1ShxeFYE1AiFUh1JN2oo0hA@mail.gmail.com>
Date:   Mon, 3 Jul 2023 09:25:21 +0200
From:   Alexander Larsson <alexl@...hat.com>
To:     Jingbo Xu <jefflexu@...ux.alibaba.com>
Cc:     hsiangkao@...ux.alibaba.com, chao@...nel.org, huyue2@...lpad.com,
        linux-erofs@...ts.ozlabs.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC 0/2] erofs: introduce bloom filter for xattr

On Wed, Jun 28, 2023 at 5:38 AM Jingbo Xu <jefflexu@...ux.alibaba.com> wrote:
>
> Hi all,
>
> Sorry for the late reply as I was on vacation these days.
>
> I test the hash bit for all xattrs given by Alex[1], to see if each
> xattr could be mapped into one unique bit in the 32-bit bloom filter.
>
> [1]
> https://lore.kernel.org/all/CAL7ro1HhYUDrOX7A-13p7rLBZSWHTQWGOdOzVcYkddkU_LArUw@mail.gmail.com/
>
>
> On 6/21/23 4:32 PM, Jingbo Xu wrote:
> >
> > 3.2. input of hash function
> > -------------------------
> > As previously described, each hash function will map the given data into
> > one bit of the bloom filter map.  In our use case, xattr name serves as
> > the key of hash function.
> >
> > When .getxattr() gets called, only index (e.g. EROFS_XATTR_INDEX_USER)
> > and the remaining name apart from the prefix are handy.  To avoid
> > constructing the full xattr name, the above index and name are fed into
> > the hash function directly in the following way:
> >
> > ```
> > bit = xxh32(name, strlen(name), index + i);
> > ```
> >
> > where index serves as part of seed, so that it gets involved in the
> > calculation for the hash.
>
>
> All xattrs are hashed with one single hash function.
>
> I first tested with the following hash function:
>
> ```
> xxh32(name, strlen(name), index)
> ```
>
> where `index` represents the index of corresponding predefined name
> prefix (e.g. EROFS_XATTR_INDEX_USER), while `name` represents the name
> after stripping the above predefined name prefix (e.g.
> "overlay.metacopy" for "user.overlay.metacopy")
>
>
> The mapping results are:
>
> bit  0: security.SMACK64EXEC
> bit  1:
> bit  2: user.overlay.protattr
> bit  3: trusted.overlay.impure, user.overlay.opaque, user.mime_type
> bit  4:
> bit  5: user.overlay.origin
> bit  6: user.overlay.metacopy, security.evm
> bit  8: trusted.overlay.opaque
> bit  9: trusted.overlay.origin
> bit 10: trusted.overlay.upper, trusted.overlay.protattr
> bit 11: security.apparmor, security.capability
> bit 12: security.SMACK64
> bit 13: user.overlay.redirect, security.ima
> bit 14: user.overlay.upper
> bit 15: trusted.overlay.redirect
> bit 16: security.SMACK64IPOUT
> bit 17:
> bit 18: system.posix_acl_access
> bit 19: security.selinux
> bit 20:
> bit 21:
> bit 22: system.posix_acl_default
> bit 23: security.SMACK64MMAP
> bit 24: user.overlay.impure, user.overlay.nlink, security.SMACK64TRANSMUTE
> bit 25: trusted.overlay.metacopy
> bit 26:
> bit 27: security.SMACK64IPIN
> bit 28:
> bit 29:
> bit 30: trusted.overlay.nlink
> bit 31:
>
> Here 30 xattrs are mapped into 22 bits.  There are two potential
> conflicts, i.e. bit 10 (trusted.overlay.upper, trusted.overlay.protattr)
> and bit 24 (user.overlay.impure, user.overlay.nlink).

Bit 11 (apparmor and capabilities) seems like the most likely thing to
run into. I.e. on an apparmor-using system, many files would have
apparmor xattr set, so looking up security.capabilities on it would
cause a false negative and we'd unnecessarily read the xattrs.

> > An alternative way is to calculate the hash from the full xattr name by
> > feeding the prefix string and the remaining name string separately in
> > the following way:
> >
> > ```
> > xxh32_reset()
> > xxh32_update(prefix string, ...)
> > xxh32_update(remaining name, ...)
> > xxh32_digest()
> > ```
> >
> > But I doubt if it really deserves to call multiple APIs instead of one
> > single xxh32().
>
>
> I also tested with the following hash function, where the full name of
> the xattr, e.g. "user.overlay.metacopy", is fed into the hash function.
>
> ```
> xxh32(name, strlen(name), 0)
> ```
>
>
> Following are the mapping results:
>
> bit  0: trusted.overlay.impure, user.overlay.protattr
> bit  1: security.SMACK64IPOUT
> bit  2:
> bit  3: security.capability
> bit  4: security.selinux
> bit  5: security.ima
> bit  6: user.overlay.metacopy
> bit  8:
> bit  9: trusted.overlay.redirect, security.SMACK64EXEC
> bit 10: system.posix_acl_access
> bit 11: trusted.overlay.nlink
> bit 12: trusted.overlay.opaque
> bit 13:
> bit 14:
> bit 15:
> bit 16:
> bit 17: user.overlay.impure
> bit 18: security.apparmor
> bit 19:
> bit 20: user.overlay.origin, user.overlay.nlink, security.SMACK64TRANSMUTE
> bit 21:
> bit 22: trusted.overlay.metacopy, trusted.overlay.protattr
> bit 23: user.overlay.upper, security.evm
> bit 24: user.overlay.redirect, security.SMACK64IPIN,
> system.posix_acl_default
> bit 25: security.SMACK64
> bit 26:
> bit 27: trusted.overlay.upper, security.SMACK64MMAP
> bit 28: trusted.overlay.origin, user.mime_type
> bit 29:
> bit 30:
> bit 31: user.overlay.opaque
>
> 30 xattrs are mapped into 20 bits.  Similarly there are two potential
> conflicts, i.e. bit 20 (user.overlay.origin, user.overlay.nlink) and bit
> 22 (trusted.overlay.metacopy, trusted.overlay.protattr).
>
>
> Summary
> =======
>
> Personally I would prefer the former, as it maps xattrs into the bloom
> filter more evenly (22 bits vs 20 bits) and can better cooperate with
> the kernel routine (index and the remaining name string, rather than the
> full name string, are handy).

I agree that we want the approach with better cooperation with the
kernel function. However, I would much prefer if all the xattrs that
are commonly set on many files are unconflicted. This would be at
least: selinux, ima, evm, apparmor.

Can't you just add a magic constant to the seed? Then we can come up
with one that gives a good spread and hardcode that.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Alexander Larsson                                Red Hat, Inc
       alexl@...hat.com         alexander.larsson@...il.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ