linux-kernel - Re: [RFC] fs-verity and encryption for EROFS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y6PN8vpE0xbppmpB@B-P7TQMD6M-0146.local>
Date:   Thu, 22 Dec 2022 11:24:34 +0800
From:   Gao Xiang <hsiangkao@...ux.alibaba.com>
To:     linux-fscrypt@...r.kernel.org, linux-erofs@...ts.ozlabs.org,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        Eric Biggers <ebiggers@...nel.org>
Cc:     Jingbo Xu <jefflexu@...ux.alibaba.com>,
        Joseph Qi <joseph.qi@...ux.alibaba.com>,
        Liu Jiang <gerry@...ux.alibaba.com>,
        Zefan Li <lizefan.x@...edance.com>,
        Xin Yin <yinxin.x@...edance.com>,
        Liu Bo <bo.liu@...ux.alibaba.com>, Gao Xiang <xiang@...nel.org>
Subject: Re: [RFC] fs-verity and encryption for EROFS

( + more lists )

On Wed, Dec 21, 2022 at 02:41:40PM +0800, Gao Xiang wrote:
> Hi folks,
> 
> (As Eric suggested, I post it on list now..)
> 
> In order to outline what we could do next to benefit various image-based
> distribution use cases (especially for signed+verified images and
> confidential computing), I'd like to discuss two potential new
> features for EROFS: verification and encryption.
> 
> - Verification
> 
> As we're known that currently dm-verity is mainly used for read-only
> devices to keep the image integrity.  However, if we consider an
> image-based system with lots of shared blobs (no matter they are
> device-based or file-based).  IMHO, it'd be better to have an in-band
> (rather than a device-mapper out-of-band) approach to verify such blobs.
> 
> In particular, currently in container image use cases, an EROFS image
> can consist of
> 
>   - one meta blob for metadata and filesystem tree;
> 
>   - several data-shared blobs with chunk-based de-duplicated data (in
>     layers to form the incremental update way; or some other ways like
>     one file-one blob)
> 
> Currently data blobs can be varied from (typically) dozen blobs to (in
> principle) 2^16 - 1 blobs.  dm-verity setup is much hard to cover such
> usage but that distribution form is more and more common with the
> revolution of containerization.
> 
> Also since we have EROFS over fscache infrastructure, file-based
> distribution makes dm-verity almost impossible as well. Generally we
> could enable underlayfs fs-verity I think, but considering on-demand
> lazy pulling from remote, such data may be incomplete before data is
> fully downloaded. (I think that is also almost like what Google did
> fs-verity for incfs.)  In addition, IMO it's not good if we rely on
> features of a random underlay fs with generated tree from random
> hashing algorithm and no original signing (by image creator).

random hashing algorithm, underlay block sizes, (maybe) new underlay
layout and no original signing, which impacts reproduction.

> 
> My preliminary thought for EROFS on verification is to have blob-based
> (or device-based) merkle trees but makes such image integrity
> self-contained so that Android, embedded, system rootfs, and container
> use cases can all benefit from it.. 
> 
> Also as a self-containerd verfication approaches as the other Linux
> filesystems, it makes bootloaders and individual EROFS image unpacker
> to support/check image integrity and signing easily...
> 
> It seems the current fs-verity codebase can almost be well-fitted for
> this with some minor modification.  If possible, we could go further
> in this way.
> 
> - Encryption
> 
> I also have some rough preliminary thought for EROFS encryption.
> (Although that is not quite in details as verification.)  Currently we
> have full-disk encryption and file-based encryption, However, in order
> to do finer data sharing between encrypted data (it seems hard to do
> finer data de-duplication with file-based encryption), we could also
> consider modified convergence encryption, especially for image-based
> offline data.
> 
> In order to prevent dictionary attack, the key itself may not directly be
> derived from its data hashing, but we could assign some random key
> relating to specific data as an encrypted chunk and find a way to share
> these keys and data in a trusted domain.
> 
> The similar thought was also shown in the presentation of AWS Lambda
> sparse filesystem, although they don't show much internal details:
> https://youtu.be/FTwsMYXWGB0
> 
> Anyway, for encryption, it's just a preliminary thought but we're happy
> to have a better encryption solution for data sharing for confidential
> container images... 
> 
> Thanks,
> Gao Xiang