[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250205164012.GJ5777@suse.cz>
Date: Wed, 5 Feb 2025 17:40:12 +0100
From: David Sterba <dsterba@...e.cz>
To: Daniel Vacek <neelx@...e.com>
Cc: Chris Mason <clm@...com>, Josef Bacik <josef@...icpanda.com>,
David Sterba <dsterba@...e.com>, Nick Terrell <terrelln@...com>,
linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] btrfs/zstd: enable negative compression levels mount
option
On Thu, Jan 30, 2025 at 10:13:36AM +0100, Daniel Vacek wrote:
> On Wed, 29 Jan 2025 at 23:42, David Sterba <dsterba@...e.cz> wrote:
> > Up to -15 it's 3x improvement which translates to about 33% of the
> > original size. And this is only for rough estimate, kernel compression
> > could be slightly worse due to slightly different parameters.
> >
> > We can let it to -15, so it's same number as the upper limit.
>
> I was getting less favorable results with my testing which leads me to
> the ultimate rhetorical question:
>
> What do we know about the dataset users are possibly going to apply?
This does not need to be a rhetorical question, this is what needs to be
asked when adding a new feature or use case. We do not know exactly but
in this case we can evaluate expected types of data regarding
compressibility, run benchmarks and do some predictions.
> And how do you want to assess the right cut-off having incomplete
> information about the nature of the data?
Analyze typical use cases, suggest a solution, evaluate and either take
it or repeat.
> Why doesn't zstd enforce any limit itself?
That can be answered by ZSTD people, that the realtime level number
translates to the internal parameters may be an outlier because the
normal level are defined in a big table that specifically defines what
each level should do so there's not predictable pattern.
https://elixir.bootlin.com/linux/v6.13.1/source/lib/zstd/compress/clevels.h#L23
> Is this even a matter (or responsibility) of the filesystem to force
> some arbitrary limit here? Maybe yes?
Yes and this is for practical reasons.
> As mentioned before, personally I'd leave it to the users so that they
> can freely choose whatever suits them the best. I don't see any
> technical or maintenance issues opening this limit.
As a user I see an unbounded number for relatime limit level and have no
idea which one to use. So I go to the documentation and see that
somebody evaluated the levels on various data sets with description of
compressibility and says that levels -1 and -15 can give some reasonable
results. I can also reevaluate it for my own data set or take some
recommendation.
What would IMHO look really strange is to see another 1000 levels
allowed by the parameter but docuented as "no obvious benefit, only
extra overhead".
If somebody comes later with a concrete numbers that it would be good to
have a few more levels allowed we can talk about it and adjust it again.
The level is currently not stored anywhere but we will want that
eventually for the properties so limiting the number is necessay anyway.
So this is a technical and compatibility reason.
Powered by blists - more mailing lists