linux-ext4 - Re: ext4 v6.15-rc2 baseline

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aAKjIUbRYH8h4FnE@bombadil.infradead.org>
Date: Fri, 18 Apr 2025 12:08:17 -0700
From: Luis Chamberlain <mcgrof@...nel.org>
To: Theodore Ts'o <tytso@....edu>
Cc: adilger.kernel@...ger.ca, linux-ext4@...r.kernel.org,
	kdevops@...ts.linux.dev, dave@...olabs.net, jack@...e.cz
Subject: Re: ext4 v6.15-rc2 baseline

On Thu, Apr 17, 2025 at 10:56:23PM -0500, Theodore Ts'o wrote:
> On Thu, Apr 17, 2025 at 06:42:25PM -0700, Luis Chamberlain wrote:
> > 
> > ext4_defaults: 793 tests, 2 failures, 259 skipped, 10521 seconds
> >   Failures: generic/223 generic/741
> 
> generic/223 is excluded in my tests.  From [1]:
> 
> // generic/223 tests file alignment, which works on ext4 only by
> // accident because we're not RAID stripe aware yet, and works at all
> // because we have bias towards aligning on power-of-two block numbers.
> // It is a flaky test for some configurations, so skip it.
> generic/223
> 
> [1] https://github.com/tytso/xfstests-bld/blob/master/test-appliance/files/root/fs/ext4/exclude

Why not just add a hook to the test to skip it upstream?

> > ext4_bigalloc16k_4k: 793 tests, 26 failures, 341 skipped, 8856 seconds
> >   Failures: ext4/033 generic/075 generic/082 generic/091 generic/112
> >     generic/127 generic/219 generic/223 generic/230 generic/231
> >     generic/232 generic/233 generic/234 generic/235 generic/263
> >     generic/280 generic/381 generic/382 generic/566 generic/587
> >     generic/600 generic/601 generic/681 generic/682 generic/691
> >     generic/741
> 
> Hmm, some of these are because there ar a bunch of tests that don't
> work well the allocation cluster size != the file system block size.

We experienced a lot of test bugs for LBS but we addressed them.

> See [2] for the tests that I exclude.  These are fundamentally test
> bugs that just don't work for bigalloc's clustered allocation.

Absolutely all of these are test bugs? And they can't be fixed to
test bigalloc?

> [2] https://github.com/tytso/xfstests-bld/blob/master/test-appliance/files/root/fs/ext4/cfg/bigalloc_4k.exclude
> 
> As far as the rest of the bigalloc failures, some of them is hard to
> tell because you're not saving all of the test artifacts.  In
> particular, the tests which run fsx create ${seq}.*.fsx{good,bad,log}
> files.  My test appliance saves them, because they are super helpful
> when debugging a test failure.  kdevops apparently doesn't.

Patch posted.

> What I do is save the entire results directory, 

The experience we have is sometimes test bugs create TFB files (too f big),
and also earlier its not clear if we had to be conservative about space.
We have a solution in place now to not have to care about space for
results, but also in practice TFB files in practice also stall CIs and
networks, etc. And so TFB files are ignored.

If *.fsx{good,bad,log} won't ever be TFB, then we'll be good. Specially
since we can scale for archiving now.

> although by default I
> truncate any test artifacts from passing tests to 31k (this amount is
> configurable via a command line option to gce-xfstests).  This is
> important because some of artifact files are super verbose, and if you
> save them all, the time to run xz on the tar file takes forever.  But
> if the tests fail, they are *super* useful.

Right, same experience here. We call these TFB files. And we have a size
threshold too.

> For the other bigalloc failures, I have a suspicion --- how big is the
> TEST and SCRATCH devices that you are using?  By default, most of my
> test scenarios use a "small" config which is 5G.  But for the bigalloc
> tests, for the 4k block / 64k cluster size, the deviec needs to be at
> least 20G or some of the tests will fail with ENOSPC.

They are 20GiB. This is configurable via CONFIG_FSTESTS_SPARSE_FILE_SIZE.

  Luis