[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8f854339-1cc0-e575-f320-50a6d9d5a775@linux.alibaba.com>
Date: Tue, 17 Jan 2023 08:12:13 +0800
From: Gao Xiang <hsiangkao@...ux.alibaba.com>
To: Alexander Larsson <alexl@...hat.com>, linux-fsdevel@...r.kernel.org
Cc: linux-kernel@...r.kernel.org, gscrivan@...hat.com
Subject: Re: [PATCH v2 0/6] Composefs: an opportunistically sharing verified
image filesystem
On 2023/1/16 23:27, Alexander Larsson wrote:
> On Mon, 2023-01-16 at 21:26 +0800, Gao Xiang wrote:
I will stop saying this overlay permission model anymore since
there are more experienced folks working on this, although SUID
stuff is still dangerous to me as an end-user: IMHO, its hard
for me to identify proper sub-sub-subdir UID/GID in "objects"
at runtime, even they could happen much deep which is different
from localfs with loopback devices or overlayfs. I don't know
what then inproper sub-sub-subdir UID/GID in "objects" could
cause.
It seems currently ostree uses "root" all the time for such
"objects" subdirs, I don't know.
>>>
>>>>
>>>>>
>>>>> Instead what we have done with composefs is to make filesystem
>>>>> image
>>>>> generation from the ostree repository 100% reproducible. Then
>>>>> we
>>>>> can
>>>>
>>>> EROFS is all 100% reproduciable as well.
>>>>
>>>
>>>
>>> Really, so if I today, on fedora 36 run:
>>> # tar xvf oci-image.tar
>>> # mkfs.erofs oci-dir/ oci.erofs
>>>
>>> And then in 5 years, if someone on debian 13 runs the same, with
>>> the
>>> same tar file, then both oci.erofs files will have the same sha256
>>> checksum?
>>
>> Why it doesn't? Reproducable builds is a MUST for Android use cases
>> as well.
>
> That is not quite the same requirements. A reproducible build in the
> traditional sense is limited to a particular build configuration. You
> define a set of tools for the build, and use the same ones for each
> build, and get a fixed output. You don't expect to be able to change
> e.g. the compiler and get the same result. Similarly, it is often the
> case that different builds or versions of compression libraries gives
> different results, so you can't expect to use e.g. a different libz and
> get identical images.
>
>> Yes, it may break between versions by mistake, but I think
>> reproducable builds is a basic functionalaity for all image
>> use cases.
>>
>>>
>>> How do you handle things like different versions or builds of
>>> compression libraries creating different results? Do you guarantee
>>> to
>>> not add any new backwards compat changes by default, or change any
>>> default options? Do you guarantee that the files are read from
>>> "oci-
>>> dir" in the same order each time? It doesn't look like it.
>>
>> If you'd like to say like that, why mkcomposefs doesn't have the
>> same issue that it may be broken by some bug.
>>
>
> libcomposefs defines a normalized form for everything like file order,
> xattr orders, etc, and carefully normalizes everything such that we can
> guarantee these properties. It is possible that some detail was missed,
> because we're humans. But it was a very conscious and deliberate design
> choice that is deeply encoded in the code and format. For example, this
> is why we don't use compression but try to minimize size in other ways.
EROFS is reproducable since its dirents are all sorted because
of its on-disk definition. And its xattrs are also sorted if
images needs to be reproducable.
I don't know what's the difference between these two
reproducable builds. EROFS is designed for golden images, if
you pass in a set of configuration options for mkfs.erofs, it
should output the same output, otherwise they are really
buges and need to be fixed.
Compression algorithms could generate different outputs between
versions, and generally compressed data is stable for most
compression algorithms in a specific version but that is another
story.
EROFS can live without compression.
>
>>>>
>>>> But really, personally I think the issue above is different from
>>>> loopback devices and may need to be resolved first. And if
>>>> possible,
>>>> I hope it could be an new overlayfs feature for everyone.
>>>
>>> Yeah. Independent of composefs, I think EROFS would be better if
>>> you
>>> could just point it to a chunk directory at mount time rather than
>>> having to route everything through a system-wide global cachefs
>>> singleton. I understand that cachefs does help with the on-demand
>>> download aspect, but when you don't need that it is just in the
>>> way.
>>
>> Just check your reply to Dave's review, it seems that how
>> composefs dir on-disk format works is also much similar to
>> EROFS as well, see:
>>
>> https://docs.kernel.org/filesystems/erofs.html -- Directories
>>
>> a block vs a chunk = dirent + names
>>
>> cfs_dir_lookup -> erofs_namei + find_target_block_classic;
>> cfs_dir_lookup_in_chunk -> find_target_dirent.
>
> Yeah, the dirent layout looks very similar. I guess great minds think
> alike! My approach was simpler initially, but it kinda converged on
> this when I started optimizing the kernel lookup code with binary
> search.
>
>> Yes, great projects could be much similar to each other
>> occasionally, not to mention opensource projects ;)
>>
>> Anyway, I'm not opposed to Composefs if folks really like a
>> new read-only filesystem for this. That is almost all I'd like
>> to say about Composefs formally, have fun!
Because, anyway, I have no idea considering opensource projects
could also do folk, so (maybe) such is life.
It seems rather another an incomplete EROFS from several points
of view. Also see:
https://lore.kernel.org/all/1b192a85-e1da-0925-ef26-178b93d0aa45@plexistor.com/T/#u
I will go on making a better EROFS as a promise to the
community initially.
Thanks,
Gao Xiang
>>
>> Thanks,
>> Gao Xiang
>
> Cool, thanks for the feedback.
>
>
Powered by blists - more mailing lists