lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <508603.1650385022@warthog.procyon.org.uk>
Date:   Tue, 19 Apr 2022 17:17:02 +0100
From:   David Howells <dhowells@...hat.com>
To:     Max Kellermann <mk@...all.com>
Cc:     dhowells@...hat.com, linux-cachefs@...hat.com,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: fscache corruption in Linux 5.17?

Max Kellermann <mk@...all.com> wrote:

> At least one web server is still in this broken state right now.  So
> if you need anything from that server, tell me, and I'll get it.

Can you turn on:

echo 65536 >/sys/kernel/debug/tracing/buffer_size_kb
echo 1 >/sys/kernel/debug/tracing/events/cachefiles/cachefiles_read/enable
echo 1 >/sys/kernel/debug/tracing/events/cachefiles/cachefiles_write/enable
echo 1 >/sys/kernel/debug/tracing/events/cachefiles/cachefiles_trunc/enable
echo 1 >/sys/kernel/debug/tracing/events/cachefiles/cachefiles_io_error/enable
echo 1 >/sys/kernel/debug/tracing/events/cachefiles/cachefiles_vfs_error/enable

Then try and trigger the bug if you can.  The trace can be viewed with:

cat /sys/kernel/debug/tracing/trace | less

The problem very likely happens on write rather than read.  If you know of a
file that's corrupt, turn on the tracing above and read that file.  Then look
in the trace buffer and you should see the corresponding lines and they should
have the backing inode in them, marked "B=iiii" where "iiii" is the inode
number of the file in hex.  You should be able to examine the backing file by
finding it with something like:

	find /var/cache/fscache -inum $((0xiiii))

and see if you can see the corruption in there.  Note that there may be blocks
of zeroes corresponding to unfetched file blocks.

Also, what filesystem is backing your cachefiles cache?  It could be useful to
dump the extent list of the file.  You should be able to do this with
"filefrag -e".

As to why this happens, a write that's misaligned by 31 bytes should cause DIO
to a disk to fail - so it shouldn't be possible to write that.  However, I'm
doing fallocate and truncate on the file to shape it so that DIO will work on
it, so it's possible that there's a bug there.  The cachefiles_trunc trace
lines may help catch that.

David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ