lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4214717.mogB4TqSGs@suse>
Date:   Mon, 20 Mar 2023 12:18:38 +0100
From:   "Fabio M. De Francesco" <fmdefrancesco@...il.com>
To:     Jan Kara <jack@...e.cz>, Al Viro <viro@...iv.linux.org.uk>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [git pull] vfs.git sysv pile

On giovedì 16 marzo 2023 11:30:21 CET Fabio M. De Francesco wrote:
> On giovedì 16 marzo 2023 10:00:35 CET Jan Kara wrote:
> > On Wed 15-03-23 19:08:57, Fabio M. De Francesco wrote:
> > > On mercoledì 1 marzo 2023 15:14:16 CET Al Viro wrote:
> > > > On Wed, Mar 01, 2023 at 02:00:18PM +0100, Jan Kara wrote:
> > > > > On Wed 01-03-23 12:20:56, Fabio M. De Francesco wrote:
> > > > > > On venerdì 24 febbraio 2023 04:26:57 CET Al Viro wrote:
> > > > > > > 	Fabio's "switch to kmap_local_page()" patchset (originally
> 
> after
> 
> > > > > > > 	the
> > > > > > > 
> > > > > > > ext2 counterpart, with a lot of cleaning up done to it; as the
> > > > > > > matter
> > > 
> > > of
> > > 
> > > > > > > fact, ext2 side is in need of similar cleanups - calling
> 
> conventions
> 
> > > > > > > there
> > > > > > > are bloody awful).
> > > 
> > > [snip]
> > > 
> > > > I think I've pushed a demo patchset to vfs.git at some point back in
> > > > January... Yep - see #work.ext2 in there; completely untested, though.
> > > 
> > > The following commits from the VFS tree, #work.ext2 look good to me.
> > > 
> > > f5b399373756 ("ext2: use offset_in_page() instead of open-coding it as
> > > subtraction")
> > > c7248e221fb5 ("ext2_get_page(): saner type")
> > > 470e54a09898 ("ext2_put_page(): accept any pointer within the page")
> > > 15abcc147cf7 ("ext2_{set_link,delete_entry}(): don't bother with
> 
> page_addr")
> 
> > > 16a5ee2027b7 ("ext2_find_entry()/ext2_dotdot(): callers don't need
> 
> page_addr
> 
> > > anymore")
> > > 
> > > Reviewed-by: Fabio M. De Francesco <fmdefrancesco@...il.com>
> > 
> > Thanks!
> > 
> > > I could only read the code but I could not test it in the same QEMU/KVM
> > > x86_32 VM where I test all my HIGHMEM related work.
> > > 
> > > Btrfs as well as all the other filesystems I converted to
> 
> kmap_local_page()
> 
> > > don't make the processes in the VM to crash, whereas the xfstests on 
ext2
> > > trigger the OOM killer at random tests (only sometimes they exit
> > > gracefully).
> > > 
> > > FYI, I tried to run the tests with 6GB of RAM, booting a kernel with
> > > HIGHMEM64GB enabled. I cannot add my "Tested-by" tag.
> > 
> > Hum, interesting. Reading your previous emails this didn't seem to happen
> > before applying this series, did it?
> 
> I wrote too many messages but was probably not able to explain the facts
> properly. Please let me summarize...
> 
> 1) When testing ext2 with "./check -g quick" in a QEMU/KVM x86_32 VM, 6GB 
RAM,
> booting a Vanilla kernel 6.3.0-rc1 with HIGHMEM64GB enabled, the OOM Killer
> kicks in at random tests _with_ and _without_ Al's patches.
> 
> 2) The only case which does never trigger the OOM Killer is running the 
tests
> on ext2 formatted filesystems in loop disks with the stock openSUSE kernel
> which is the 6.2.1-1-pae.
> 
> 3) The same "./check -g quick" on 6.3.0-rc1 runs always to completion with
> other filesystems. I ran xfstests several times on Btrfs and I had no
> problems.
> 
> 4) I cannot git-bisect this issue with ext2 because I cannot trust the 
results
> on any particular Kernel version. I mean that I cannot mark any specific
> version neither "good" or "bad" because it happens that the same "good"
> version instead make xfstests crash at the next run.
> 
> My conclusion is that we probably have some kind of race that makes the 
random
> tests crash at random runs of random Kernel versions between (at least) SUSE
> 6.2.1 and Vanilla current.
> 
> But it may be very well the case that I'm doing something stupid (e.g., with
> QEMU configuration or setup_disks or I can't imagine whatever else) and that
> I'm unable to see where I make mistakes. After all, I'm still a newcomer 
with
> little experience :-)
> 
> Therefore, I'd suggest that someone else try to test ext2 in an x86_32 VM.
> However, I'm 99.5% sure that Al's patches are good by the mere inspection of
> his code.
> 
> I hope that this summary contains everything that may help.
> 
> However, I remain available to provide any further information and to give 
my
> contribution if you ask me for specific tasks.
> 
> For my part I have no idea how to investigate what is happening. In these
> months I have run the VM hundreds of times on the most disparate filesystems
> to test my conversions to kmap_local_page() and I have never seen anything
> like this happen.
> 
> Thanks,
> 
> Fabio
> 
> 
> Honza
> 
> > --
> > Jan Kara <jack@...e.com>
> > SUSE Labs, CR

I can't yet figure out which conditions lead to trigger the OOM Killer to kill 
the XFCE Desktop Environment, and the xfstests (which I usually run into the 
latter). After all, reserving 6GB of main memory to a QEMU/KVM x86_32 VM had 
always been more than adequate.

So, I thought I'd better ignore that 6GB for a 32 bit architecture are a 
notable amount of RAM and squeezed some more from the host until I went to 
reserve 8GB. I know that this is not what who is able to find out what 
consumes so much main memory would do, but wanted to get the output from the 
tests, one way or the other... :-(

OK, I could finally run my tests to completion and had no crashes at all. I 
ran "./check -g quick" on one "test" + three "scratch" loop devices formatted 
with "mkfs.ext2 -c". I ran three times _with_ and then three times _without_ 
Al's following patches cloned from his vfs tree, #work.ext2 branch:

f5b399373756 ("ext2: use offset_in_page() instead of open-coding it as 
subtraction")
c7248e221fb5 ("ext2_get_page(): saner type")
470e54a09898 ("ext2_put_page(): accept any pointer within the page")
15abcc147cf7 ("ext2_{set_link,delete_entry}(): don't bother with page_addr")
16a5ee2027b7 ("ext2_find_entry()/ext2_dotdot(): callers don't need

All the six tests were no longer killed by the Kernel :-)

I got 144 failures on 597 tests, regardless of the above listed patches.

My final conclusion is that these patches don't introduce regressions. I see 
several tests that produce memory leaks but, I want to stress it again, the 
failing tests are always the same with and without the patches.

therefore, I think that now I can safely add my tag to all five patches listed 
above...

Tested-by: Fabio M. De Francesco <fmdefrancesco@...il.com>

Regards,

Fabio







Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ