linux-kernel - Re: [PATCH] btrfs/zstd: enable negative compression levels mount option

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAPjX3FfCgtUcrTiui8VW=bPC9fdrUqa65dCDeymk+=jnFOYWFA@mail.gmail.com>
Date: Mon, 10 Feb 2025 09:18:05 +0100
From: Daniel Vacek <neelx@...e.com>
To: dsterba@...e.cz
Cc: Chris Mason <clm@...com>, Josef Bacik <josef@...icpanda.com>, David Sterba <dsterba@...e.com>, 
	Nick Terrell <terrelln@...com>, linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] btrfs/zstd: enable negative compression levels mount option

On Wed, 5 Feb 2025 at 17:40, David Sterba <dsterba@...e.cz> wrote:
>
> On Thu, Jan 30, 2025 at 10:13:36AM +0100, Daniel Vacek wrote:
> > On Wed, 29 Jan 2025 at 23:42, David Sterba <dsterba@...e.cz> wrote:
> > > Up to -15 it's 3x improvement which translates to about 33% of the
> > > original size. And this is only for rough estimate, kernel compression
> > > could be slightly worse due to slightly different parameters.
> > >
> > > We can let it to -15, so it's same number as the upper limit.
> >
> > I was getting less favorable results with my testing which leads me to
> > the ultimate rhetorical question:
> >
> > What do we know about the dataset users are possibly going to apply?
>
> This does not need to be a rhetorical question, this is what needs to be
> asked when adding a new feature or use case. We do not know exactly but
> in this case we can evaluate expected types of data regarding
> compressibility, run benchmarks and do some predictions.
>
> > And how do you want to assess the right cut-off having incomplete
> > information about the nature of the data?
>
> Analyze typical use cases, suggest a solution, evaluate and either take
> it or repeat.

OK.

> > Why doesn't zstd enforce any limit itself?
>
> That can be answered by ZSTD people, that the realtime level number
> translates to the internal parameters may be an outlier because the
> normal level are defined in a big table that specifically defines what
> each level should do so there's not predictable pattern.
>
> https://elixir.bootlin.com/linux/v6.13.1/source/lib/zstd/compress/clevels.h#L23

Yeah, I'm aware of this.

> > Is this even a matter (or responsibility) of the filesystem to force
> > some arbitrary limit here? Maybe yes?
>
> Yes and this is for practical reasons.
>
> > As mentioned before, personally I'd leave it to the users so that they
> > can freely choose whatever suits them the best. I don't see any
> > technical or maintenance issues opening this limit.
>
> As a user I see an unbounded number for relatime limit level and have no
> idea which one to use. So I go to the documentation and see that
> somebody evaluated the levels on various data sets with description of
> compressibility and says that levels -1 and -15 can give some reasonable
> results. I can also reevaluate it for my own data set or take some
> recommendation.
>
> What would IMHO look really strange is to see another 1000 levels
> allowed by the parameter but docuented as "no obvious benefit, only
> extra overhead".
>
> If somebody comes later with a concrete numbers that it would be good to
> have a few more levels allowed we can talk about it and adjust it again.

Right, agreed.

> The level is currently not stored anywhere but we will want that
> eventually for the properties so limiting the number is necessay anyway.
> So this is a technical and compatibility reason.

I was looking into properties as well and I have a draft using 8 bits
for the level value. I think we can stick to that for now.