lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5f5e006b-d13b-45a5-835d-57a64d450a1a@linux.alibaba.com>
Date: Thu, 26 Sep 2024 18:46:35 +0800
From: Gao Xiang <hsiangkao@...ux.alibaba.com>
To: Ariel Miculas <amiculas@...co.com>
Cc: Benno Lossin <benno.lossin@...ton.me>, rust-for-linux@...r.kernel.org,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 LKML <linux-kernel@...r.kernel.org>,
 Linus Torvalds <torvalds@...ux-foundation.org>,
 Al Viro <viro@...iv.linux.org.uk>, Gary Guo <gary@...yguo.net>,
 linux-fsdevel@...r.kernel.org, linux-erofs@...ts.ozlabs.org
Subject: Re: [RFC PATCH 03/24] erofs: add Errno in Rust



On 2024/9/26 17:51, Ariel Miculas wrote:
> On 24/09/26 04:25, Gao Xiang wrote:
>>
>>
>> On 2024/9/26 16:10, Ariel Miculas wrote:
>>> On 24/09/26 09:04, Gao Xiang wrote:
>>>>
>>
>>
>> ...
>>
>>>
>>> And here [4] you can see the space savings achieved by PuzzleFS. In
>>> short, if you take 10 versions of Ubuntu Jammy from dockerhub, they take
>>> up 282 MB. Convert them to PuzzleFS and they only take up 130 MB (this
>>> is before applying any compression, the space savings are only due to
>>> the chunking algorithm). If we enable compression (PuzzleFS uses Zstd
>>> seekable compression), which is a fairer comparison (considering that
>>> the OCI image uses gzip compression), then we get down to 53 MB for
>>> storing all 10 Ubuntu Jammy versions using PuzzleFS.
>>>
>>> Here's a summary:
>>> # Steps
>>>
>>> * I’ve downloaded 10 versions of Jammy from hub.docker.com
>>> * These images only have one layer which is in tar.gz format
>>> * I’ve built 10 equivalent puzzlefs images
>>> * Compute the tarball_total_size by summing the sizes of every Jammy
>>>     tarball (uncompressed) => 766 MB (use this as baseline)
>>> * Sum the sizes of every oci/puzzlefs image => total_size
>>> * Compute the total size as if all the versions were stored in a single
>>>     oci/puzzlefs repository => total_unified_size
>>> * Saved space = tarball_total_size - total_unified_size
>>>
>>> # Results
>>> (See [5] if you prefer the video format)
>>>
>>> | Type | Total size (MB) | Average layer size (MB) | Unified size (MB) | Saved (MB) / 766 MB |
>>> | --- | --- | --- | --- | --- |
>>> | Oci (uncompressed) | 766 | 77 | 766 | 0 (0%) |
>>> | PuzzleFS uncompressed | 748 | 74 | 130 | 635 (83%) |
>>> | Oci (compressed) | 282 | 28 | 282 | 484 (63%) |
>>> | PuzzleFS (compressed) | 298 | 30 | 53 | 713 (93%) |
>>>
>>> Here's the script I used to download the Ubuntu Jammy versions and
>>> generate the PuzzleFS images [6] to get an idea about how I got to these
>>> results.
>>>
>>> Can we achieve these results with the current erofs features?  I'm
>>> referring specifically to this comment: "EROFS already supports
>>> variable-sized chunks + CDC" [7].
>>
>> Please see
>> https://erofs.docs.kernel.org/en/latest/comparsion/dedupe.html
> 
> Great, I see you've used the same example as I did. Though I must admit
> I'm a little surprised there's no mention of PuzzleFS in your document.

Why I need to mention and even try PuzzleFS here (there are too many
attempts why I need to try them all)?  It just compares to the EROFS
prior work.

> 
>>
>> 	                Total Size (MiB)	Average layer size (MiB)	Saved / 766.1MiB
>> Compressed OCI (tar.gz)	282.5	28.3	63%
>> Uncompressed OCI (tar)	766.1	76.6	0%
>> Uncomprssed EROFS	109.5	11.0	86%
>> EROFS (DEFLATE,9,32k)	46.4	4.6	94%
>> EROFS (LZ4HC,12,64k)	54.2	5.4	93%
>>
>> I don't know which compression algorithm are you using (maybe Zstd?),
>> but from the result is
>>    EROFS (LZ4HC,12,64k)  54.2
>>    PuzzleFS compressed   53?
>>    EROFS (DEFLATE,9,32k) 46.4
>>
>> I could reran with EROFS + Zstd, but it should be smaller. This feature
>> has been supported since Linux 6.1, thanks.
> 
> The average layer size is very impressive for EROFS, great work.
> However, if we multiply the average layer size by 10, we get the total
> size (5.4 MiB * 10 ~ 54.2 MiB), whereas for PuzzleFS, we see that while
> the average layer size is 30 MIB (for the compressed case), the unified
> size is only 53 MiB. So this tells me there's blob sharing between the
> different versions of Ubuntu Jammy with PuzzleFS, but there's no sharing
> with EROFS (what I'm talking about is deduplication across the multiple
> versions of Ubuntu Jammy and not within one single version).

Don't make me wrong, I don't think you got the point.

First, what you asked was `I'm referring specifically to this
comment: "EROFS already supports variable-sized chunks + CDC"`,
so I clearly answered with the result of compressed data global
deduplication with CDC.

Here both EROFS and Squashfs compresses 10 Ubuntu images into
one image for fair comparsion to show the benefit of CDC, so
I believe they basically equal to your `Unified size`s, so
the result is

			Your unified size
	EROFS (LZ4HC,12,64k)  54.2
	PuzzleFS compressed   53?
	EROFS (DEFLATE,9,32k) 46.4

That is why I used your 53 unified size to show EROFS is much
smaller than PuzzleFS.

The reason why EROFS and SquashFS doesn't have the `Total Size`s
is just because we cannot store every individual chunk into some
seperate file.

Currently, I have seen no reason to open arbitary kernel files
(maybe hundreds due to large folio feature at once) in the page
fault context.  If I modified `mkfs.erofs` tool, I could give
some similar numbers, but I don't want to waste time now due
to `open arbitary kernel files in the page fault context`.

As I said, if PuzzleFS finally upstream some work to open kernel
files in page fault context, I will definitely work out the same
feature for EROFS soon, but currently I don't do that just
because it's very controversal and no in-tree kernel filesystem
does that.

Thanks,
Gao Xiang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ