[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250127180250.GQ5777@twin.jikos.cz>
Date: Mon, 27 Jan 2025 19:02:50 +0100
From: David Sterba <dsterba@...e.cz>
To: Daniel Vacek <neelx@...e.com>
Cc: Chris Mason <clm@...com>, Josef Bacik <josef@...icpanda.com>,
David Sterba <dsterba@...e.com>, Nick Terrell <terrelln@...com>,
linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] btrfs/zstd: enable negative compression levels mount
option
On Fri, Jan 24, 2025 at 08:55:56AM +0100, Daniel Vacek wrote:
> This patch allows using the fast modes (negative compression levels) of zstd.
>
> The performance benchmarks do not show any significant (positive or negative)
> influence other than the lower compression ratio. But %system CPU usage
> should also be lower which is not clearly visible from the results below.
> That's because with the fast modes the processing is IO-bound and not CPU-bound.
>
> for level in {-15..-1} {1..15}; \
> do printf "level %3d\n" $level; \
> mount -o compress=zstd:$level /dev/sdb /mnt/test/; \
> grep sdb /proc/mounts; \
> sync; time { time cp /dev/shm/linux-6.13.tar.xz /mnt/test/; sync; }; \
> compsize /mnt/test/linux-6.13.tar.xz; \
> sync; time { time cp /dev/shm/linux-6.13.tar /mnt/test/; sync; }; \
> compsize /mnt/test/linux-6.13.tar; \
> rm /mnt/test/linux-6.13.tar*; \
> umount /mnt/test/; \
> done |& tee results | \
> awk '/^level/{print}/^real/{print$2}/^TOTAL/{print$3"\t"$2" |"}' | \
> paste - - - - - - -
>
> linux-6.13.tar.xz 141M | linux-6.13.tar 1.4G
It does not make much sense to compare it to a .xz type of compression,
this will be detected by the heuristic as incompressible and skipped
right away.
The linux sources are highly compressible as it's a text-like source, so
this is one category. It would be good to see benchmarks on file types
commonly found on systems, with similar characteristics regarding
compressibility.
- document-like (structured binary), ie. pdf, "office type of documents"
- executable-like (/bin/*, libraries)
- (maybe more)
Anything else can be considered incompressible, all the formats with
internal compression or very compact binary format that is beyond the
capabilities of the in-kernel implementation and its limitations.
> copy wall time sync wall time usage ratio | copy wall time sync wall time usage ratio
> ==============================================================+===============================================
> level -15 0m0,261s 0m0,329s 141M 100% | 0m2,511s 0m2,794s 598M 40% |
> level -14 0m0,145s 0m0,291s 141M 100% | 0m1,829s 0m2,443s 581M 39% |
> level -13 0m0,141s 0m0,289s 141M 100% | 0m1,832s 0m2,347s 566M 38% |
> level -12 0m0,140s 0m0,291s 141M 100% | 0m1,879s 0m2,246s 548M 37% |
> level -11 0m0,133s 0m0,271s 141M 100% | 0m1,789s 0m2,257s 530M 35% |
I found an old mail asking ZSTD people which realtime levels are
meaningful, the -10 was mentioned as a good cut-off. The numbers above
confirm that although this is on a small sample.
> level -10 0m0,146s 0m0,318s 141M 100% | 0m1,769s 0m2,228s 512M 34% |
> level -9 0m0,138s 0m0,288s 141M 100% | 0m1,869s 0m2,304s 493M 33% |
> level -8 0m0,146s 0m0,294s 141M 100% | 0m1,846s 0m2,446s 475M 32% |
> level -7 0m0,151s 0m0,298s 141M 100% | 0m1,877s 0m2,319s 457M 30% |
> level -6 0m0,134s 0m0,271s 141M 100% | 0m1,918s 0m2,314s 437M 29% |
> level -5 0m0,139s 0m0,307s 141M 100% | 0m1,860s 0m2,254s 417M 28% |
> level -4 0m0,153s 0m0,295s 141M 100% | 0m1,916s 0m2,272s 391M 26% |
> level -3 0m0,145s 0m0,308s 141M 100% | 0m1,830s 0m2,369s 369M 24% |
> level -2 0m0,150s 0m0,294s 141M 100% | 0m1,841s 0m2,344s 349M 23% |
> level -1 0m0,150s 0m0,312s 141M 100% | 0m1,872s 0m2,487s 332M 22% |
> level 1 0m0,142s 0m0,310s 141M 100% | 0m1,880s 0m2,331s 290M 19% |
> level 2 0m0,144s 0m0,286s 141M 100% | 0m1,933s 0m2,266s 288M 19% |
> level 3 0m0,146s 0m0,304s 141M 100% | 0m1,966s 0m2,300s 276M 18% *|
> level 4 0m0,146s 0m0,287s 141M 100% | 0m2,173s 0m2,496s 275M 18% |
> level 5 0m0,146s 0m0,304s 141M 100% | 0m2,307s 0m2,728s 261M 17% |
> level 6 0m0,138s 0m0,267s 141M 100% | 0m2,435s 0m3,151s 253M 17% |
> level 7 0m0,142s 0m0,301s 141M 100% | 0m2,274s 0m3,617s 251M 16% |
> level 8 0m0,136s 0m0,291s 141M 100% | 0m2,066s 0m3,913s 249M 16% |
> level 9 0m0,134s 0m0,283s 141M 100% | 0m2,676s 0m4,496s 247M 16% |
> level 10 0m0,151s 0m0,297s 141M 100% | 0m2,424s 0m5,102s 247M 16% |
> level 11 0m0,149s 0m0,296s 141M 100% | 0m3,485s 0m7,803s 245M 16% |
> level 12 0m0,144s 0m0,304s 141M 100% | 0m3,954s 0m9,067s 244M 16% |
> level 13 0m0,148s 0m0,319s 141M 100% | 0m5,350s 0m13,307s 247M 16% |
> level 14 0m0,145s 0m0,296s 141M 100% | 0m6,916s 0m18,218s 238M 16% |
> level 15 0m0,145s 0m0,293s 141M 100% | 0m8,304s 0m24,675s 233M 15% |
>
> Signed-off-by: Daniel Vacek <neelx@...e.com>
> ---
> Checking the ZSTD workspace memory sizes it looks like sharing
> the level 1 workspace with all the fast modes should be safe.
> >From the debug printf output:
>
> level_size max_size
> [ 11.032659] btrfs zstd ws: -15 926969 926969
Yeah the level 1 should have enough memory, I think there are some
tricks inside ZSTD to reduce the requirements on the dictionary so
almost 1MiB is quite excessive (not only for the realtime levels), as we
do only 128K at a time anyway.
> [ 11.032662] btrfs zstd ws: -14 926969 926969
> [ 11.032663] btrfs zstd ws: -13 926969 926969
> [ 11.032664] btrfs zstd ws: -12 926969 926969
> [ 11.032665] btrfs zstd ws: -11 926969 926969
> [ 11.032665] btrfs zstd ws: -10 926969 926969
> [ 11.032666] btrfs zstd ws: -9 926969 926969
> [ 11.032666] btrfs zstd ws: -8 926969 926969
> [ 11.032667] btrfs zstd ws: -7 926969 926969
> [ 11.032668] btrfs zstd ws: -6 926969 926969
> [ 11.032668] btrfs zstd ws: -5 926969 926969
> [ 11.032669] btrfs zstd ws: -4 926969 926969
> [ 11.032670] btrfs zstd ws: -3 926969 926969
> [ 11.032670] btrfs zstd ws: -2 926969 926969
> [ 11.032671] btrfs zstd ws: -1 926969 926969
> [ 11.032672] btrfs zstd ws: 1 943353 943353
> [ 11.032673] btrfs zstd ws: 2 1041657 1041657
> [ 11.032674] btrfs zstd ws: 3 1303801 1303801
> [ 11.032674] btrfs zstd ws: 4 1959161 1959161
> [ 11.032675] btrfs zstd ws: 5 1697017 1959161
> [ 11.032676] btrfs zstd ws: 6 1697017 1959161
> [ 11.032676] btrfs zstd ws: 7 1697017 1959161
> [ 11.032677] btrfs zstd ws: 8 1697017 1959161
> [ 11.032678] btrfs zstd ws: 9 1697017 1959161
> [ 11.032679] btrfs zstd ws: 10 1697017 1959161
> [ 11.032679] btrfs zstd ws: 11 1959161 1959161
> [ 11.032680] btrfs zstd ws: 12 2483449 2483449
> [ 11.032681] btrfs zstd ws: 13 2632633 2632633
> [ 11.032681] btrfs zstd ws: 14 3277111 3277111
> [ 11.032682] btrfs zstd ws: 15 3277111 3277111
>
> Hence the implementation uses `zstd_ws_mem_sizes[0]` for all negative levels.
>
> I also plan to update the `btrfs fi defrag` interface to be able to use
> these levels (or any levels at all).
>
> @@ -332,8 +335,9 @@ void zstd_put_workspace(struct list_head *ws)
> }
> }
>
> - set_bit(workspace->level - 1, &wsm.active_map);
> - list_add(&workspace->list, &wsm.idle_ws[workspace->level - 1]);
> + level = max(0, workspace->level - 1);
This seems to be a quite frequent pattern how to adjust the level,
please create a helper for that so it's not the plain max() everywhere.
> + set_bit(level, &wsm.active_map);
> + list_add(&workspace->list, &wsm.idle_ws[level]);
> workspace->req_level = 0;
Powered by blists - more mailing lists